0
我是Python和Spark的新手,我試圖從Azure將文件加載到表中。以下是我的簡單代碼。使用Python從MS Azure讀取和寫入文件
import os
import sys
os.environ['SPARK_HOME'] = "C:\spark-2.0.0-bin-hadoop2.74"
sys.path.append("C:\spark-2.0.0-bin-hadoop2.7\python")
sys.path.append("C:\spark-2.0.0-bin-hadoop2.7\python\lib\py4j-0.10.1-src.zip")
from pyspark import SparkContext
from pyspark import SparkConf
from pyspark.sql.types import *
from pyspark.sql import *
sc = SparkContext("local", "Simple App")
def loadFile(path, rowDelimeter, columnDelimeter, firstHeaderColName):
loadedFile = sc.newAPIHadoopFile(path, "org.apache.hadoop.mapreduce.lib.input.TextInputFormat",
"org.apache.hadoop.io.LongWritable", "org.apache.hadoop.io.Text",
conf={"textinputformat.record.delimiter": rowDelimeter})
rddData = loadedFile.map(lambda l:l[1].split(columnDelimeter)).filter(lambda f: f[0] != firstHeaderColName)
return rddData
Schema= StructType([
StructField("Column1", StringType(), True),
StructField("Column2", StringType(), True),
StructField("Column3", StringType(), True),
StructField("Column4", StringType(), True)
])
rData= loadFile("wasbs://[email protected]/File.txt",
'\r\n',"#|#","Column1")
DF = sc.createDataFrame(Data,Schema)
DF.write.saveAsTable("Table1")
我收到錯誤,如FileNotFoundError:[WinError 2]系統找不到指定的文件
你使用Azure SparkHDinsight嗎?同時,您能否讓我知道您的代碼中的哪一行會拋出此錯誤消息? –