Apache SPARK與SQLContext :: IndexError

我試圖執行提供的基本示例推斷使用反射的架構 Apache SPARK文檔的一部分。Apache SPARK與SQLContext :: IndexError

我對Cloudera的快速啓動VM（CDH5）

我想執行的例子這樣做是如下::

# sc is an existing SparkContext. 
from pyspark.sql import SQLContext, Row 
sqlContext = SQLContext(sc) 

# Load a text file and convert each line to a Row. 
lines = sc.textFile("/user/cloudera/analytics/book6_sample.csv") 
parts = lines.map(lambda l: l.split(",")) 
people = parts.map(lambda p: Row(name=p[0], age=int(p[1]))) 

# Infer the schema, and register the DataFrame as a table. 
schemaPeople = sqlContext.createDataFrame(people) 
schemaPeople.registerTempTable("people") 

# SQL can be run over DataFrames that have been registered as a table. 
teenagers = sqlContext.sql("SELECT name FROM people WHERE age >= 13 AND age <= 19") 

# The results of SQL queries are RDDs and support all the normal RDD operations. 
teenNames = teenagers.map(lambda p: "Name: " + p.name) 
for teenName in teenNames.collect(): 
    print(teenName)

我跑的代碼完全如上面一樣，但當我執行最後一個命令（for循環）時，總是收到錯誤「IndexError：列表索引超出範圍」。

輸入文件book6_sample可在 book6_sample.csv。

我完全按照上面所示運行代碼，但是當我執行最後一個命令（for循環）時，總是收到錯誤「IndexError：list index out of range」。

請指出我要出錯的地方。

在此先感謝。

問候，斯里蘭卡

來源

2016-06-28 Sri

你的文件有在最後一個空行，這是造成這個error.Open您的文本編輯器文件並刪除該行希望將工作

來源

2016-06-28 06:40:54

嘿薩欽，這沒有工作在提出改變之後。謝謝 – Sri

Apache SPARK與SQLContext :: IndexError

回答

相關問題