Spark-shell does not encounter so many problems This is a sbt dependency problem in IDEA.

1, import package problem

import java.util.Properties
import org.apache.spark.sql
import org.apache.spark.sql.types._
import org.apache.spark.sql.Row
import org.apache.spark.sql.SparkSession
import org.apache.spark.SparkConf
import org.apache.spark.SparkContext

The build.sbt file is as follows:

name := "Simple Project"
version := "1.0"
scalaVersion := "2.11.8"
libraryDependencies += "org.apache.spark" % "spark-core_2.11" % "2.1.0"
libraryDependencies += "org.apache.hbase" % "hbase-client" % "1.1.2"
libraryDependencies += "org.apache.hbase" % "hbase-common" % "1.1.2"
libraryDependencies += "org.apache.hbase" % "hbase-server" % "1.1.2"

libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.2.0"
libraryDependencies += "mysql" % "mysql-connector-java" % "8.0.15"

The above org.apache.spark requires a package of 2.0.0 or higher. Otherwise, SparkSession cannot be imported.

Another puzzling thing is that the mysql-connector-java-8.0.15.jar imported from the outside of the dependency does not work, resulting in the package can not find the com.mysql.jdbc driver.

So: libraryDependencies += "mysql" % "mysql-connector-java" % "8.0.15" is to solve the driver problem.

Wait for sbt:dump to complete and run the code successfully.

--------------------------------

import java.util.Properties

import org.apache.spark.sql

import org.apache.spark.sql.types._

import org.apache.spark.sql.Row

import org.apache.spark.sql.SparkSession

import org.apache.spark.SparkConf

import org.apache.spark.SparkContext

//import com.mysql.jdbc

//SparkSession

object ConnectJDBC {

def main(args: Array[String]): Unit = {

val conf = new SparkConf().setAppName("ConnectJDBC").setMaster("local[*]")

val sc = new SparkContext(conf)

val spark = SparkSession.builder().getOrCreate()

import spark.implicits._

// read the information

Val jdbcDF = spark.read.format("jdbc").option("url", "jdbc:mysql://localhost:3306/spark") //*****This is the database name

.option("driver", "com.mysql.jdbc.Driver").option("dbtable", "student")//***** is the table name

.option("user", "root").option("password", "123456").load()

jdbcDF.show()

/ / Below we set two data to represent two student information

val studentRDD = spark.sparkContext.parallelize(Array("1 Licheng M 26", "2 Jianghua M 27")).map(_.split(" "))

/ / The following to set the mode information

val schema = StructType(List(StructField("id", IntegerType, true), StructField("name", StringType, true), StructField("gender", StringType, true), StructField("age", IntegerType, true)))

/ / Create a Row object below, each Row object is a row in the rowRDD

val rowRDD = studentRDD.map(p => Row(p(0).toInt, p(1).trim, p(2).trim, p(3).toInt))

/ / Establish a correspondence between the Row object and the mode, that is, the data and the pattern are associated

val studentDF = spark.createDataFrame(rowRDD, schema)

/ / Create a prop variable to save JDBC connection parameters

val prop = new Properties()

Prop.put("user", "root") // indicates that the username is root

Prop.put("password", "123456") // indicates that the password is hadoop

Prop.put("driver", "com.mysql.jdbc.Driver") // indicates that the driver is com.mysql.jdbc.Driver

// /usr/local/spark/jars/mysql-connector-java-5.1.40/mysql-connector-java-5.1.40-bin.jar

/ / The following can be connected to the database, using append mode, indicating additional records to the student table in the database spark

studentDF.write.mode("append").jdbc("jdbc:mysql://localhost:3306/spark", "spark.student", prop)

}

---------------------------

operation result:

+---+---------+------+---+
| id| name|gender|age|
+---+---------+------+---+
| 1| Licheng| M| 26|
| 2| Jianghua| M| 27|
+---+---------+------+---+

Hadoop

Friday, October 2, 2020

saprk 2.0 jdbc

Spark-shell does not encounter so many problems This is a sbt dependency problem in IDEA.

No comments:

Post a Comment