2016-09-21 94 views
0

我是Python和Spark的新手,我試圖從Azure將文件加載到表中。以下是我的簡單代碼。使用Python從MS Azure讀取和寫入文件

import os 
 
import sys 
 
os.environ['SPARK_HOME'] = "C:\spark-2.0.0-bin-hadoop2.74" 
 
sys.path.append("C:\spark-2.0.0-bin-hadoop2.7\python") 
 
sys.path.append("C:\spark-2.0.0-bin-hadoop2.7\python\lib\py4j-0.10.1-src.zip") 
 
from pyspark import SparkContext 
 
from pyspark import SparkConf 
 
from pyspark.sql.types import * 
 
from pyspark.sql import * 
 
sc = SparkContext("local", "Simple App") 
 

 

 
def loadFile(path, rowDelimeter, columnDelimeter, firstHeaderColName): 
 
    
 
    
 
    loadedFile = sc.newAPIHadoopFile(path, "org.apache.hadoop.mapreduce.lib.input.TextInputFormat", 
 
             "org.apache.hadoop.io.LongWritable", "org.apache.hadoop.io.Text", 
 
             conf={"textinputformat.record.delimiter": rowDelimeter}) 
 
    
 
    
 
    rddData = loadedFile.map(lambda l:l[1].split(columnDelimeter)).filter(lambda f: f[0] != firstHeaderColName) 
 
     
 
    return rddData 
 

 

 
Schema= StructType([ 
 
    
 
    StructField("Column1", StringType(), True), 
 
    StructField("Column2", StringType(), True), 
 
    StructField("Column3", StringType(), True), 
 
    StructField("Column4", StringType(), True) 
 
     
 
     
 

 
]) 
 

 
rData= loadFile("wasbs://[email protected]/File.txt", 
 
        '\r\n',"#|#","Column1") 
 
DF = sc.createDataFrame(Data,Schema) 
 
DF.write.saveAsTable("Table1")

我收到錯誤,如FileNotFoundError:[WinError 2]系統找不到指定的文件

+0

你使用Azure SparkHDinsight嗎?同時,您能否讓我知道您的代碼中的哪一行會拋出此錯誤消息? –

回答

0

@Miruthan, 據我知道,如果我們想要將WASB中的數據讀入Spark,其URL語法如下:

wasb[s]://<containername>@<accountname>.blob.core.windows.net/<path> 

同時,由於Azure S torage Blob(WASB)用作與HDInsight集羣關聯的存儲帳戶。請您仔細檢查它嗎?任何更新,請讓我知道。