Friday, October 30, 2020

hive Parquet to textfile

Insert overrwrite Directory '/user/test' row format delimited fileds terminated by 'u0001' storted as textfile select nvl(id, ""), nvl(asd, "") From db.test

Sunday, October 18, 2020

hive example query

Create database if not exist testdb location /user/testdb.db';
Drop table if exists testdb.emp;
Create table testdb.emp row format delimited fields terminated by '~' escaped by '\\134' lines terminated by '\n' null defined as ' ' stored as textfile as select case when ascii('empid') = 0 and length('empid) > 0 then " " else 'empid' end 'empid', concat(nvl ('empapproved', ' ' , "@#$") From xyzdb.abc where empId <> 1234;

Wednesday, October 7, 2020

sqoop Compression snappy and avro Format

 locate core-site.xml -> find codec type available

Example of loading data from MySQL to HDFS (compression: Snappy and Avro format)

$ sqoop import \
 --connect jdbc:mysql://localhost:33/mybbdd \
 --username=root -P \
 --table=mytable \
 --driver=com.mysql.jdbc.Driver \
 --target-dir=/ej_snappy_avro \
 --compress \
 --compression-codec org.apache.hadoop.io.compress.SnappyCodec \ 
 --as-avrodatafile

Example of loading data from MySQL to HDFS (compression: gzip and Avro format)

$ sqoop import \
 --connect jdbc:mysql://localhost/mibbdd \
 --username=root -P \
 --table=mitabla \
 --driver=com.mysql.jdbc.Driver \
 --target-dir=/ej_gzip_avro \
 --compress \
 --compression-codec org.apache.hadoop.io.compress.GzipCodec \
 --as-avrodatafile

 

Example of loading data from MySQL to HDFS (compression: BZIP2 and Sequence format)

$ sqoop import \
 --connect jdbc:mysql://localhost/mibbdd \
 --username=root -P \
 --table=mitabla \
 --driver=com.mysql.jdbc.Driver \
 --target-dir=/ej_bzip2_sequence \
 --compress \
 --compression-codec org.apache.hadoop.io.compress.BZip2Codec \
 --as-sequencefile

 

Example of loading data from MySQL to HDFS (restricting data with columns)

$ sqoop import \
 --connect jdbc:mysql://localhost/mibbdd \
 --username=root -P \
 --table=mitabla \
 --driver=com.mysql.jdbc.Driver \
 --target-dir=/ej_2_columns \
 --columns nombre,edad

 

Example of loading data from MySQL to HDFS (restricting data with WHERE)

$ sqoop import \
 --connect jdbc:mysql://localhost/mybbdd \
 --username=root -P \
 --table=mytable \
 --driver=com.mysql.jdbc.Driver \
 --target-dir=/ej_mayor_age_40 \
 --where "edad > 40"

 

Example of loading data from MySQL to HDFS (incremental load)

In order to make an incremental insertion we need to include new data to the table “MyTable”, for this we execute in MySQL the following sentence:

mysql> 
INSERT INTO mytable (nombre, edad, salario) VALUES
        ("Diego", 24, 21000), ("Rosa", 26, 24000), ("Javier", 28, 25000), ("Lorena", 35, 28000), ("Miriam", 42, 30000), ("Patricia", 43, 25000), ("Natalia", 45, 39000);

Note: To make the insertion necessary to do it in the db “MIBBDD”

Once the insertion is done we can make the incremental insertion from the 8 as it is the first element introduced in the new insertion.

$ sqoop import \ 
 --connect jdbc:mysql://localhost/mybbdd \ 
 --username=root -P \
 --table=mytable \ 
 --driver=com.mysql.jdbc.Driver \ 
 --target-dir=/my_table_hdfs \ 
 --incremental append \
 --check-column id \
 --last-value 8

 

Example of loading data from MySQL to HDFS and consultable from HIVE

In order to make an insertion of the table in the hive database, we must create db where it will be inserted, to avoid problems:

Hive > CREATE DATABASE mybbddhive;

Once the database is created, you are ready to run the query:

$ sqoop import \
 --connect jdbc:mysql://localhost/mybbdd \
 --username=root -P \
 --table=mytable \
 --driver=com.mysql.jdbc.Driver \
 --target-dir=/ej_hive \
 --compress \
 --compression-codec org.apache.hadoop.io.compress.SnappyCodec \ 
 --hive-import \
 --hive-database mihive \ 
 --create-hive-table \
 --hive-table ej_hive_table

Creating a MySQL database table

A database is created with a table on which to perform the tests, the following commands will be used.

accessing MYSQL

$ mysql
You can make an ERROR 1045 (28000): Access denied for user ‘ root ‘ @ ‘ localhost, which is resolved:
mysql -u root-P

Note: Keep in mind that the MySQL and system wash can be different.

Consult database

mysql> show databases;

Create Database

mysql> create database myddbb;

Use database

mysql> use myddbb;

Create base table

mysql> CREATE TABLE mytable (
         id MEDIUMINT NOT NULL AUTO_INCREMENT,
         name CHAR (30) NOT NULL,
         age INTEGER (30),
         salary INTEGER (30),
         PRIMARY KEY (id));

CREATE TABLE 2

mysql> CREATE TABLE mytable2 (
         id MEDIUMINT NOT NULL AUTO_INCREMENT,
         name CHAR (30) NOT NULL,
         age INTEGER (30),
         salary INTEGER (30),
         PRIMARY KEY (id));

Insert Data

mysql> INSERT INTO mytable (name, age, salary) values
        ("Peter", 24, 21000), ("Maria", 26, 24000), ("John", 28, 25000), ("Louis", 35, 28000), ("Monica", 42, 30000), ("Rose", 43, 25000), ("Susana", 45, 39000);

Note: If you have permission problems enter MySQL and give all permissions:

grant all privileges on *.* to 'root'@'localhost' IDENTIFIED BY 'MySQL_Key' WITH GRANT OPTION;

Exit MYSQL

mysql> exit;

 

Load MySQL data to HDFS

Example of loading data from the table “MyTable” of the Database “MIBBDD” to the folder HDFs name “Mitabla_hdfs”

MySQL to HDFs

$ sqoop Import \
 --connect jdbc:mysql://localhost/myddbb \
 --username = root -P \
 --table = mytable \
 --driver = com.mysql.jdbc.driver \
 --target-dir =/my_hdfs_table \
 --fields-terminated-by = ',' \
 --lines-terminated-by '\n'

Target-dir: File HDFS where it is stored.
Table: Identifies the table to be copied.
Clear-Staging-table: Indicates that past data can be deleted.
VERBOSE: Prints additional information to facilitate debugging.
Fields-terminated-by: defining the delimiter.

 

Loading data from HDFS to MySQL

Example of loading data from the HDFS folder named “my_hdfs_table” to the “mytable2” table in the “myddbb” database.


$ sqoop export 
 --connect jdbc: mysql://localhost/myddbb 
 --username = root -P 
 --table = mytable2 
 --export-dir =/my_hdfs_table -m 1

Note: If you have problems “set $ ACCUMULO_HOME to the root in your ACCUMULO intallation”, it can be avoided with:

$ ACCUMULO_HOME = '/var/lib/accumulo'
$ Export ACCUMULO_HOME
$ sudo mkdir/var/lib/accumulo

To practice with Sqoop then consult: “Examples of Sqoop”, in this section compiled many examples that can serve you useful.

Source: official Shell documentation

Source: official documentation for JAVA API

optimization tecqiue

WITH MAX_DATE AS (SELECT MAX(DATE) AS MAX_DATE FROM DEMODB.EMP) SELECT * FROM DEMODB.EMP A, MAX_DT B WHERE A.DATE=B.MAX_DATE;


Friday, October 2, 2020

saprk 2.0 jdbc

 

Spark-shell does not encounter so many problems This is a sbt dependency problem in IDEA.

1, import package problem

  1. import java.util.Properties
  2. import org.apache.spark.sql
  3. import org.apache.spark.sql.types._
  4. import org.apache.spark.sql.Row
  5. import org.apache.spark.sql.SparkSession
  6. import org.apache.spark.SparkConf
  7. import org.apache.spark.SparkContext
2.

The build.sbt file is as follows:

name := "Simple Project"
version := "1.0"
scalaVersion := "2.11.8"
libraryDependencies += "org.apache.spark" % "spark-core_2.11" % "2.1.0"
libraryDependencies += "org.apache.hbase" % "hbase-client" % "1.1.2"
libraryDependencies += "org.apache.hbase" % "hbase-common" % "1.1.2"
libraryDependencies += "org.apache.hbase" % "hbase-server" % "1.1.2"

libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.2.0"
libraryDependencies += "mysql" % "mysql-connector-java" % "8.0.15"

The above org.apache.spark requires a package of 2.0.0 or higher. Otherwise, SparkSession cannot be imported.

Another puzzling thing is that the mysql-connector-java-8.0.15.jar imported from the outside of the dependency does not work, resulting in the package can not find the com.mysql.jdbc driver.

So: libraryDependencies += "mysql" % "mysql-connector-java" % "8.0.15" is to solve the driver problem.

Wait for sbt:dump to complete and run the code successfully.

--------------------------------

 

import java.util.Properties

 

import org.apache.spark.sql

 

import org.apache.spark.sql.types._

import org.apache.spark.sql.Row

import org.apache.spark.sql.SparkSession

import org.apache.spark.SparkConf

import org.apache.spark.SparkContext

 

//import com.mysql.jdbc

 

//SparkSession

object ConnectJDBC {

  def main(args: Array[String]): Unit = {

    val conf = new SparkConf().setAppName("ConnectJDBC").setMaster("local[*]")

    val sc = new SparkContext(conf)

 

    val spark = SparkSession.builder().getOrCreate()

    import spark.implicits._

 

    // read the information

         Val jdbcDF = spark.read.format("jdbc").option("url", "jdbc:mysql://localhost:3306/spark") //*****This is the database name

             .option("driver", "com.mysql.jdbc.Driver").option("dbtable", "student")//***** is the table name

      .option("user", "root").option("password", "123456").load()

    jdbcDF.show()

 

 

         / / Below we set two data to represent two student information

    val studentRDD = spark.sparkContext.parallelize(Array("1 Licheng M 26", "2 Jianghua M 27")).map(_.split(" "))

 

         / / The following to set the mode information

    val schema = StructType(List(StructField("id", IntegerType, true), StructField("name", StringType, true), StructField("gender", StringType, true), StructField("age", IntegerType, true)))

 

         / / Create a Row object below, each Row object is a row in the rowRDD

    val rowRDD = studentRDD.map(p => Row(p(0).toInt, p(1).trim, p(2).trim, p(3).toInt))

 

         / / Establish a correspondence between the Row object and the mode, that is, the data and the pattern are associated

    val studentDF = spark.createDataFrame(rowRDD, schema)

 

         / / Create a prop variable to save JDBC connection parameters

    val prop = new Properties()

         Prop.put("user", "root") // indicates that the username is root

         Prop.put("password", "123456") // indicates that the password is hadoop

         Prop.put("driver", "com.mysql.jdbc.Driver") // indicates that the driver is com.mysql.jdbc.Driver

//    /usr/local/spark/jars/mysql-connector-java-5.1.40/mysql-connector-java-5.1.40-bin.jar

         / / The following can be connected to the database, using append mode, indicating additional records to the student table in the database spark

    studentDF.write.mode("append").jdbc("jdbc:mysql://localhost:3306/spark", "spark.student", prop)

    

  }

}

---------------------------

operation result:

+---+---------+------+---+
| id|     name|gender|age|
+---+---------+------+---+
|  1|  Licheng|     M| 26|
|  2| Jianghua|     M| 27|
+---+---------+------+---+