我要救我的加工RDD到MySQL表,我使用SparkDataFrame但我得到follwing錯誤無法MySQLdb的通過蟒蛇火花連接
py4j.protocol.Py4JJavaError: An error occurred while calling o216.jdbc.
: java.sql.SQLException: No suitable driver found for jdbc:mysql://localhost:3306/student?user=root&password=root.
我添加的mysql-罐子sparkshell
spark-shell --driver-class-path /path-to-mysql-jar/mysql-connectorjava-5.1.38-bin.jar。
from pyspark import SparkContext
from datetime import datetime
import os
import sys
from pyspark.sql import SQLContext, Row
sqlContext = SQLContext(sc)
file1 = sc.textFile("/home/hadoop/text1").cache()
file2 = sc.textFile("/home/hadoop/text2").cache()
file3 = file1.union(file2).coalesce(1).map(lambda line: line.split(','))
file1.unpersist()
file2.unpersist()
result = file3.map(lambda x: (x[0]+', '+x[1],float(x[2]))).reduceByKey(lambda a,b:a+b).sortByKey('true').coalesce(1)
result = result.map(lambda x:x[0]+','+str(x[1]))\
schema_site = sqlContext.createDataFrame(result)
schema_site.registerTempTable("table1")
mysql_url="jdbc:mysql://localhost:3306/test?user=root&password=root&driver=com.mysql.jdbc.Driver"
schema_site.write.jdbc(url=mysql_url, table="table1", mode="append")
我使用的火花火花1.5.0彬hadoop2.4
還設置蜂巢metastore。
那麼我怎麼能加載我的RDD結果到Mysql表中。
輸入文件
file1 contents are
1234567 65656545 12
1234567 65675859 11
file2 contents are,
1234567 65656545 12
1234567 65675859 11
and the resultnat RDD is like
1234567 65656545 24
1234567 65675859 22
i created the table in mysql with three colunm,
std_id std_code std_res
,我想表輸出一樣,
std_id std_code std_res
1234567 65656545 24
1234567 65675859 24
類似的問題:http://stackoverflow.com/a/31478590/2308683 –