Wednesday, October 7, 2020

sqoop Compression snappy and avro Format

 locate core-site.xml -> find codec type available

Example of loading data from MySQL to HDFS (compression: Snappy and Avro format)

$ sqoop import \
 --connect jdbc:mysql://localhost:33/mybbdd \
 --username=root -P \
 --table=mytable \
 --driver=com.mysql.jdbc.Driver \
 --target-dir=/ej_snappy_avro \
 --compress \
 --compression-codec org.apache.hadoop.io.compress.SnappyCodec \ 
 --as-avrodatafile

Example of loading data from MySQL to HDFS (compression: gzip and Avro format)

$ sqoop import \
 --connect jdbc:mysql://localhost/mibbdd \
 --username=root -P \
 --table=mitabla \
 --driver=com.mysql.jdbc.Driver \
 --target-dir=/ej_gzip_avro \
 --compress \
 --compression-codec org.apache.hadoop.io.compress.GzipCodec \
 --as-avrodatafile

 

Example of loading data from MySQL to HDFS (compression: BZIP2 and Sequence format)

$ sqoop import \
 --connect jdbc:mysql://localhost/mibbdd \
 --username=root -P \
 --table=mitabla \
 --driver=com.mysql.jdbc.Driver \
 --target-dir=/ej_bzip2_sequence \
 --compress \
 --compression-codec org.apache.hadoop.io.compress.BZip2Codec \
 --as-sequencefile

 

Example of loading data from MySQL to HDFS (restricting data with columns)

$ sqoop import \
 --connect jdbc:mysql://localhost/mibbdd \
 --username=root -P \
 --table=mitabla \
 --driver=com.mysql.jdbc.Driver \
 --target-dir=/ej_2_columns \
 --columns nombre,edad

 

Example of loading data from MySQL to HDFS (restricting data with WHERE)

$ sqoop import \
 --connect jdbc:mysql://localhost/mybbdd \
 --username=root -P \
 --table=mytable \
 --driver=com.mysql.jdbc.Driver \
 --target-dir=/ej_mayor_age_40 \
 --where "edad > 40"

 

Example of loading data from MySQL to HDFS (incremental load)

In order to make an incremental insertion we need to include new data to the table “MyTable”, for this we execute in MySQL the following sentence:

mysql> 
INSERT INTO mytable (nombre, edad, salario) VALUES
        ("Diego", 24, 21000), ("Rosa", 26, 24000), ("Javier", 28, 25000), ("Lorena", 35, 28000), ("Miriam", 42, 30000), ("Patricia", 43, 25000), ("Natalia", 45, 39000);

Note: To make the insertion necessary to do it in the db “MIBBDD”

Once the insertion is done we can make the incremental insertion from the 8 as it is the first element introduced in the new insertion.

$ sqoop import \ 
 --connect jdbc:mysql://localhost/mybbdd \ 
 --username=root -P \
 --table=mytable \ 
 --driver=com.mysql.jdbc.Driver \ 
 --target-dir=/my_table_hdfs \ 
 --incremental append \
 --check-column id \
 --last-value 8

 

Example of loading data from MySQL to HDFS and consultable from HIVE

In order to make an insertion of the table in the hive database, we must create db where it will be inserted, to avoid problems:

Hive > CREATE DATABASE mybbddhive;

Once the database is created, you are ready to run the query:

$ sqoop import \
 --connect jdbc:mysql://localhost/mybbdd \
 --username=root -P \
 --table=mytable \
 --driver=com.mysql.jdbc.Driver \
 --target-dir=/ej_hive \
 --compress \
 --compression-codec org.apache.hadoop.io.compress.SnappyCodec \ 
 --hive-import \
 --hive-database mihive \ 
 --create-hive-table \
 --hive-table ej_hive_table

Creating a MySQL database table

A database is created with a table on which to perform the tests, the following commands will be used.

accessing MYSQL

$ mysql
You can make an ERROR 1045 (28000): Access denied for user ‘ root ‘ @ ‘ localhost, which is resolved:
mysql -u root-P

Note: Keep in mind that the MySQL and system wash can be different.

Consult database

mysql> show databases;

Create Database

mysql> create database myddbb;

Use database

mysql> use myddbb;

Create base table

mysql> CREATE TABLE mytable (
         id MEDIUMINT NOT NULL AUTO_INCREMENT,
         name CHAR (30) NOT NULL,
         age INTEGER (30),
         salary INTEGER (30),
         PRIMARY KEY (id));

CREATE TABLE 2

mysql> CREATE TABLE mytable2 (
         id MEDIUMINT NOT NULL AUTO_INCREMENT,
         name CHAR (30) NOT NULL,
         age INTEGER (30),
         salary INTEGER (30),
         PRIMARY KEY (id));

Insert Data

mysql> INSERT INTO mytable (name, age, salary) values
        ("Peter", 24, 21000), ("Maria", 26, 24000), ("John", 28, 25000), ("Louis", 35, 28000), ("Monica", 42, 30000), ("Rose", 43, 25000), ("Susana", 45, 39000);

Note: If you have permission problems enter MySQL and give all permissions:

grant all privileges on *.* to 'root'@'localhost' IDENTIFIED BY 'MySQL_Key' WITH GRANT OPTION;

Exit MYSQL

mysql> exit;

 

Load MySQL data to HDFS

Example of loading data from the table “MyTable” of the Database “MIBBDD” to the folder HDFs name “Mitabla_hdfs”

MySQL to HDFs

$ sqoop Import \
 --connect jdbc:mysql://localhost/myddbb \
 --username = root -P \
 --table = mytable \
 --driver = com.mysql.jdbc.driver \
 --target-dir =/my_hdfs_table \
 --fields-terminated-by = ',' \
 --lines-terminated-by '\n'

Target-dir: File HDFS where it is stored.
Table: Identifies the table to be copied.
Clear-Staging-table: Indicates that past data can be deleted.
VERBOSE: Prints additional information to facilitate debugging.
Fields-terminated-by: defining the delimiter.

 

Loading data from HDFS to MySQL

Example of loading data from the HDFS folder named “my_hdfs_table” to the “mytable2” table in the “myddbb” database.


$ sqoop export 
 --connect jdbc: mysql://localhost/myddbb 
 --username = root -P 
 --table = mytable2 
 --export-dir =/my_hdfs_table -m 1

Note: If you have problems “set $ ACCUMULO_HOME to the root in your ACCUMULO intallation”, it can be avoided with:

$ ACCUMULO_HOME = '/var/lib/accumulo'
$ Export ACCUMULO_HOME
$ sudo mkdir/var/lib/accumulo

To practice with Sqoop then consult: “Examples of Sqoop”, in this section compiled many examples that can serve you useful.

Source: official Shell documentation

Source: official documentation for JAVA API

No comments:

Post a Comment