Friday, October 30, 2020

hive Parquet to textfile

Insert overrwrite Directory '/user/test' row format delimited fileds terminated by 'u0001' storted as textfile select nvl(id, ""), nvl(asd, "") From db.test

Sunday, October 18, 2020

hive example query

Create database if not exist testdb location /user/testdb.db';

Drop table if exists testdb.emp;

Create table testdb.emp row format delimited fields terminated by '~' escaped by '\\134' lines terminated by '\n' null defined as ' ' stored as textfile as select case when ascii('empid') = 0 and length('empid) > 0 then " " else 'empid' end 'empid', concat(nvl ('empapproved', ' ' , "@#$") From xyzdb.abc where empId <> 1234;

Wednesday, October 7, 2020

sqoop Compression snappy and avro Format

locate core-site.xml -> find codec type available

Example of loading data from MySQL to HDFS (compression: Snappy and Avro format)

$ sqoop import \
 --connect jdbc:mysql://localhost:33/mybbdd \
 --username=root -P \
 --table=mytable \
 --driver=com.mysql.jdbc.Driver \
 --target-dir=/ej_snappy_avro \
 --compress \
 --compression-codec org.apache.hadoop.io.compress.SnappyCodec \ 
 --as-avrodatafile

Example of loading data from MySQL to HDFS (compression: gzip and Avro format)

$ sqoop import \
 --connect jdbc:mysql://localhost/mibbdd \
 --username=root -P \
 --table=mitabla \
 --driver=com.mysql.jdbc.Driver \
 --target-dir=/ej_gzip_avro \
 --compress \
 --compression-codec org.apache.hadoop.io.compress.GzipCodec \
 --as-avrodatafile

Example of loading data from MySQL to HDFS (compression: BZIP2 and Sequence format)

$ sqoop import \
 --connect jdbc:mysql://localhost/mibbdd \
 --username=root -P \
 --table=mitabla \
 --driver=com.mysql.jdbc.Driver \
 --target-dir=/ej_bzip2_sequence \
 --compress \
 --compression-codec org.apache.hadoop.io.compress.BZip2Codec \
 --as-sequencefile

Example of loading data from MySQL to HDFS (restricting data with columns)

$ sqoop import \
 --connect jdbc:mysql://localhost/mibbdd \
 --username=root -P \
 --table=mitabla \
 --driver=com.mysql.jdbc.Driver \
 --target-dir=/ej_2_columns \
 --columns nombre,edad

Example of loading data from MySQL to HDFS (restricting data with WHERE)

$ sqoop import \
 --connect jdbc:mysql://localhost/mybbdd \
 --username=root -P \
 --table=mytable \
 --driver=com.mysql.jdbc.Driver \
 --target-dir=/ej_mayor_age_40 \
 --where "edad > 40"

Example of loading data from MySQL to HDFS (incremental load)

In order to make an incremental insertion we need to include new data to the table “MyTable”, for this we execute in MySQL the following sentence:

mysql> 
INSERT INTO mytable (nombre, edad, salario) VALUES
        ("Diego", 24, 21000), ("Rosa", 26, 24000), ("Javier", 28, 25000), ("Lorena", 35, 28000), ("Miriam", 42, 30000), ("Patricia", 43, 25000), ("Natalia", 45, 39000);

Note: To make the insertion necessary to do it in the db “MIBBDD”

Once the insertion is done we can make the incremental insertion from the 8 as it is the first element introduced in the new insertion.

$ sqoop import \ 
 --connect jdbc:mysql://localhost/mybbdd \ 
 --username=root -P \
 --table=mytable \ 
 --driver=com.mysql.jdbc.Driver \ 
 --target-dir=/my_table_hdfs \ 
 --incremental append \
 --check-column id \
 --last-value 8

Example of loading data from MySQL to HDFS and consultable from HIVE

In order to make an insertion of the table in the hive database, we must create db where it will be inserted, to avoid problems:

Hive > CREATE DATABASE mybbddhive;

Once the database is created, you are ready to run the query:

$ sqoop import \
 --connect jdbc:mysql://localhost/mybbdd \
 --username=root -P \
 --table=mytable \
 --driver=com.mysql.jdbc.Driver \
 --target-dir=/ej_hive \
 --compress \
 --compression-codec org.apache.hadoop.io.compress.SnappyCodec \ 
 --hive-import \
 --hive-database mihive \ 
 --create-hive-table \
 --hive-table ej_hive_table

Creating a MySQL database table
A database is created with a table on which to perform the tests, the following commands will be used.
accessing MYSQL
$ mysql
You can make an ERROR 1045 (28000): Access denied for user ‘ root ‘ @ ‘ localhost, which is resolved:
mysql -u root-P
Note: Keep in mind that the MySQL and system wash can be different.
Consult database
mysql> show databases;
Create Database
mysql> create database myddbb;
Use database
mysql> use myddbb;
Create base table
mysql> CREATE TABLE mytable (
         id MEDIUMINT NOT NULL AUTO_INCREMENT,
         name CHAR (30) NOT NULL,
         age INTEGER (30),
         salary INTEGER (30),
         PRIMARY KEY (id));
CREATE TABLE 2
mysql> CREATE TABLE mytable2 (
         id MEDIUMINT NOT NULL AUTO_INCREMENT,
         name CHAR (30) NOT NULL,
         age INTEGER (30),
         salary INTEGER (30),
         PRIMARY KEY (id));
Insert Data
mysql> INSERT INTO mytable (name, age, salary) values
        ("Peter", 24, 21000), ("Maria", 26, 24000), ("John", 28, 25000), ("Louis", 35, 28000), ("Monica", 42, 30000), ("Rose", 43, 25000), ("Susana", 45, 39000);
Note: If you have permission problems enter MySQL and give all permissions:
grant all privileges on *.* to 'root'@'localhost' IDENTIFIED BY 'MySQL_Key' WITH GRANT OPTION;
Exit MYSQL
mysql> exit;
 
Load MySQL data to HDFS
Example of loading data from the table “MyTable” of the Database “MIBBDD” to the folder HDFs name “Mitabla_hdfs”
$ sqoop Import \
 --connect jdbc:mysql://localhost/myddbb \
 --username = root -P \
 --table = mytable \
 --driver = com.mysql.jdbc.driver \
 --target-dir =/my_hdfs_table \
 --fields-terminated-by = ',' \
 --lines-terminated-by '\n'
Target-dir: File HDFS where it is stored.
Table: Identifies the table to be copied.
Clear-Staging-table: Indicates that past data can be deleted.
VERBOSE: Prints additional information to facilitate debugging.
Fields-terminated-by: defining the delimiter.
 
Loading data from HDFS to MySQL
Example of loading data from the HDFS folder named “my_hdfs_table” to the “mytable2” table in the “myddbb” database.

$ sqoop export 
 --connect jdbc: mysql://localhost/myddbb 
 --username = root -P 
 --table = mytable2 
 --export-dir =/my_hdfs_table -m 1
Note: If you have problems “set $ ACCUMULO_HOME to the root in your ACCUMULO intallation”, it can be avoided with:
$ ACCUMULO_HOME = '/var/lib/accumulo'
$ Export ACCUMULO_HOME
$ sudo mkdir/var/lib/accumulo
To practice with Sqoop then consult: “Examples of Sqoop”, in this section compiled many examples that can serve you useful.
Source: official Shell documentation
Source: official documentation for JAVA API

optimization tecqiue

WITH MAX_DATE AS (SELECT MAX(DATE) AS MAX_DATE FROM DEMODB.EMP) SELECT * FROM DEMODB.EMP A, MAX_DT B WHERE A.DATE=B.MAX_DATE;

Friday, October 2, 2020

saprk 2.0 jdbc

Spark-shell does not encounter so many problems This is a sbt dependency problem in IDEA.

1, import package problem

import java.util.Properties
import org.apache.spark.sql
import org.apache.spark.sql.types._
import org.apache.spark.sql.Row
import org.apache.spark.sql.SparkSession
import org.apache.spark.SparkConf
import org.apache.spark.SparkContext

The build.sbt file is as follows:

name := "Simple Project"
version := "1.0"
scalaVersion := "2.11.8"
libraryDependencies += "org.apache.spark" % "spark-core_2.11" % "2.1.0"
libraryDependencies += "org.apache.hbase" % "hbase-client" % "1.1.2"
libraryDependencies += "org.apache.hbase" % "hbase-common" % "1.1.2"
libraryDependencies += "org.apache.hbase" % "hbase-server" % "1.1.2"

libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.2.0"
libraryDependencies += "mysql" % "mysql-connector-java" % "8.0.15"

The above org.apache.spark requires a package of 2.0.0 or higher. Otherwise, SparkSession cannot be imported.

Another puzzling thing is that the mysql-connector-java-8.0.15.jar imported from the outside of the dependency does not work, resulting in the package can not find the com.mysql.jdbc driver.

So: libraryDependencies += "mysql" % "mysql-connector-java" % "8.0.15" is to solve the driver problem.

Wait for sbt:dump to complete and run the code successfully.

--------------------------------

import java.util.Properties

import org.apache.spark.sql

import org.apache.spark.sql.types._

import org.apache.spark.sql.Row

import org.apache.spark.sql.SparkSession

import org.apache.spark.SparkConf

import org.apache.spark.SparkContext

//import com.mysql.jdbc

//SparkSession

object ConnectJDBC {

def main(args: Array[String]): Unit = {

val conf = new SparkConf().setAppName("ConnectJDBC").setMaster("local[*]")

val sc = new SparkContext(conf)

val spark = SparkSession.builder().getOrCreate()

import spark.implicits._

// read the information

Val jdbcDF = spark.read.format("jdbc").option("url", "jdbc:mysql://localhost:3306/spark") //*****This is the database name

.option("driver", "com.mysql.jdbc.Driver").option("dbtable", "student")//***** is the table name

.option("user", "root").option("password", "123456").load()

jdbcDF.show()

/ / Below we set two data to represent two student information

val studentRDD = spark.sparkContext.parallelize(Array("1 Licheng M 26", "2 Jianghua M 27")).map(_.split(" "))

/ / The following to set the mode information

val schema = StructType(List(StructField("id", IntegerType, true), StructField("name", StringType, true), StructField("gender", StringType, true), StructField("age", IntegerType, true)))

/ / Create a Row object below, each Row object is a row in the rowRDD

val rowRDD = studentRDD.map(p => Row(p(0).toInt, p(1).trim, p(2).trim, p(3).toInt))

/ / Establish a correspondence between the Row object and the mode, that is, the data and the pattern are associated

val studentDF = spark.createDataFrame(rowRDD, schema)

/ / Create a prop variable to save JDBC connection parameters

val prop = new Properties()

Prop.put("user", "root") // indicates that the username is root

Prop.put("password", "123456") // indicates that the password is hadoop

Prop.put("driver", "com.mysql.jdbc.Driver") // indicates that the driver is com.mysql.jdbc.Driver

// /usr/local/spark/jars/mysql-connector-java-5.1.40/mysql-connector-java-5.1.40-bin.jar

/ / The following can be connected to the database, using append mode, indicating additional records to the student table in the database spark

studentDF.write.mode("append").jdbc("jdbc:mysql://localhost:3306/spark", "spark.student", prop)

}

---------------------------

operation result:

+---+---------+------+---+
| id| name|gender|age|
+---+---------+------+---+
| 1| Licheng| M| 26|
| 2| Jianghua| M| 27|
+---+---------+------+---+