0
我試圖執行提供的基本示例推斷使用反射的架構 Apache SPARK文檔的一部分。Apache SPARK與SQLContext :: IndexError
我對Cloudera的快速啓動VM(CDH5)
我想執行的例子這樣做是如下::
# sc is an existing SparkContext.
from pyspark.sql import SQLContext, Row
sqlContext = SQLContext(sc)
# Load a text file and convert each line to a Row.
lines = sc.textFile("/user/cloudera/analytics/book6_sample.csv")
parts = lines.map(lambda l: l.split(","))
people = parts.map(lambda p: Row(name=p[0], age=int(p[1])))
# Infer the schema, and register the DataFrame as a table.
schemaPeople = sqlContext.createDataFrame(people)
schemaPeople.registerTempTable("people")
# SQL can be run over DataFrames that have been registered as a table.
teenagers = sqlContext.sql("SELECT name FROM people WHERE age >= 13 AND age <= 19")
# The results of SQL queries are RDDs and support all the normal RDD operations.
teenNames = teenagers.map(lambda p: "Name: " + p.name)
for teenName in teenNames.collect():
print(teenName)
我跑的代碼完全如上面一樣,但當我執行最後一個命令(for循環)時,總是收到錯誤「IndexError:列表索引超出範圍」。
輸入文件book6_sample可在 book6_sample.csv。
我完全按照上面所示運行代碼,但是當我執行最後一個命令(for循環)時,總是收到錯誤「IndexError:list index out of range」。
請指出我要出錯的地方。
在此先感謝。
問候, 斯里蘭卡
嘿薩欽,這沒有工作在提出改變之後。謝謝 – Sri