No relevant resource is found in the selected language.

This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies. Read our privacy policy>Search

Reminder

To have a better experience, please upgrade your IE browser.

upgrade

Importing Elk Data in Batches Using GDS

Publication Date:  2019-04-12 Views:  205 Downloads:  0

Issue Description

How do I use GDS to import Elk table data in batches?

Solution

1. Log in to the Elk cluster.

su – omm // Switch to user omm.

source /opt/huawei/Bigdata/mppdb/. mppdbgs_profile

gsql –d postgres –p 25108 –r

2. (Optional) Create a user.

create user user_sys WITH sysadmin identified by "Huawei@123"; // Create user user_sys with the admin permission.

set user user_sys password 'Huawei@123'; // Log in to the system as user user_sys.

3. (Optional) Create a tablespace.

a. Log in to each Elk node as user root to create an empty directory, and set the owner to omm:wheel.

mkdir –p /mpp/bigdata/hdfs_test

chown omm:wheel –R /mpp/

b. Create a tablespace hdfs_test in the /mpp/bigdata/hdfs_test directory with parameters set as follows:

filesystem set to hdfs

cfgpath set to /opt/huawei/Bigdata/mppdb/conf

storepath set to /mppdb_data/distribute

hdfs_test is the newly created HDFS tablespace.

/mpp/bigdata/hdfs_test is an empty directory on which user omm has the read and write permissions.

/opt/huawei/Bigdata/mppdb/conf is the path of the HDFS cluster configuration file.

/mppdb_data /distribute is the directory for storing data on HDFS.

Note: Create a tablespace directory first. Otherwise, the Elk cluster is faulty.

4. Create an internal table.

create table test (id int, name varchar(20)) tablespace hdfs_test;

5. Log in to the data source server, create the local file /input_data, and import the data file data.txt into the folder.

su -root

mkdir -p /input_data // Directory for storing original files.

chown -R omm:wheel /input_data

6. (Optional) Install the GDS.

a. Decompress the Elk component installation package.

mkdir -p /opt/bin // Create the directory for installing the GDS service.

cd \FusionInsight_MPPDB\software\components\package\FusionInsight-MPPDB-x.x.x.tar.gz\package

tar –zxvf Gauss-MPPDB-ALL-PACKAGES.tar.gz // Decompress and select the installation package of the required version.

cp Gauss200-OLAP-V100R006C10-SUSE-64bit-Gds.tar.gz /opt/bin

groupadd wheel  // Create a database user group.

useradd -g wheel omm // Create a database user

chown -R omm:wheel /opt/bin/gds

7. Start the GDS.

su -omm

/opt/bin/gds/gds -d /input_data -p 192.168.18.21:5000 -H 10.10.0.1/24 –D

/opt/bin/gds: GDS installation path.

/Input_data/: directory that stores the data files.

192.168.18.21: IP address of the data server.

10.10.0.1/24: IP address of the host that is allowed to connect to the GDS. Set it to the IP address of the Elk database node that is being used.

5000: GDS listening port, which can be changed.

8. Create a foreign table.

create foreign table foreign_test (id int, name varchar(20)) SERVER gsmpp_server OPTIONS (location 'gsfs://192.168.18.21:5000/*', format 'text',mode 'normal', encoding 'utf8', delimiter ' ', null '',fill_missing_fields 'false')LOG INTO err_HR_areaS PER NODE REJECT LIMIT 'unlimited';

9. Import data through associated foreign tables.

insert into test select * from foreign_test;

Instructions: When importing data, enable the GDS service. After the import is complete, stop the service.

Start the GDS.

python gds_ctl.py start

Stop the GDS started using the configuration file.

python gds_ctl.py stop

Stop all GDS services that the current user has the permission to close.

python gds_ctl.py stop all

Stop the GDS services specified by [ip:]port that the current user has the permission to close.

python gds_ctl.py stop 127.0.0.1:8098

Query the GDS status.

python gds_ctl.py status

cm_ctl stop -m i // Close the Elk cluster.

cm_ctl start // Restart the Elk cluster.

END