2015-08-18 21 views
1

我在Eclipse IDE中使用Pyspark進行編程,並試圖轉換到Spark 1.4.1,以便最終可以使用Python 3進行編程。下面的程序工作在星火1.3.1但在星火1.4.1拋出異常:Spark 1.4.1 py4j.Py4JException:方法讀取([])不存在

from pyspark import SparkContext, SparkConf 
from pyspark.sql.types import * 
from pyspark.sql import SQLContext 

if __name__ == '__main__': 
    conf = SparkConf().setAppName("MyApp").setMaster("local") 

    global sc 
    sc = SparkContext(conf=conf)  

    global sqlc 
    sqlc = SQLContext(sc) 

    symbolsPath = 'SP500Industry.json' 
    symbolsRDD = sqlc.read.json(symbolsPath) 

    print "Done"" 

我得到的回溯如下:

Traceback (most recent call last): 
    File "/media/gavin/20A6-76BF/Current Projects Luna/PySpark    Test/Test.py", line 21, in <module> 
    symbolsRDD = sqlc.read.json(symbolsPath) #rdd with all symbols (and their industries 
    File "/home/gavin/spark-1.4.1-bin-hadoop2.6/python/pyspark/sql/context.py", line 582, in read 
    return DataFrameReader(self) 
    File "/home/gavin/spark-1.4.1-bin-hadoop2.6/python/pyspark/sql/readwriter.py", line 39, in __init__ 
self._jreader = sqlContext._ssql_ctx.read() 
    File "/home/gavin/spark-1.4.1-bin-hadoop2.6/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py", line 538, in __call__ 
    File "/home/gavin/spark-1.4.1-bin-hadoop2.6/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py", line 304, in get_return_value 
py4j.protocol.Py4JError: An error occurred while calling o18.read.   Trace: 
py4j.Py4JException: Method read([]) does not exist 
    at   py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:333) 
    at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:342) 
    at py4j.Gateway.invoke(Gateway.java:252) 
    at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133) 
    at py4j.commands.CallCommand.execute(CallCommand.java:79) 
    at py4j.GatewayConnection.run(GatewayConnection.java:207) 
    at java.lang.Thread.run(Thread.java:745)" 

我對外部庫項目是 ... spark-1.4.1-bin-hadoop2.6/python ... spa rk-1.4.1-bin-hadoop2.6/python/lib/py4j-0.8.2.1-src.zip ... spark-1.4.1-bin-hadoop2.6/python/lib/pyspark.zip(已嘗試包括和不包括這個)

有人能幫我解決我做錯了什麼嗎?

回答

0

您需要在調用加載之前將格式設置爲'json'。否則,spark會假定您正在嘗試加載Parquet文件。

symbolsRDD = sqlc.read.format('json').json(symbolsPath) 

但是,我仍然無法弄清楚爲什麼你得到一個讀取方法錯誤。 Spark應該抱怨說它找到了無效的Parquet文件。

+0

我得到與OP中提到的完全相同的錯誤和您的調整。感謝您的幫助。 –