我在https://databricks.com/blog/2015/02/17/introducing-dataframes-in-spark-for-large-scale-data-science.html上看到了一個Dataframes教程,這個教程寫在Python
。我正試圖將它翻譯成Scala
。火花在地圖中創建行
他們有下面的代碼:
df = context.load("/path/to/people.json")
# RDD-style methods such as map, flatMap are available on DataFrames
# Split the bio text into multiple words.
words = df.select("bio").flatMap(lambda row: row.bio.split(" "))
# Create a new DataFrame to count the number of words
words_df = words.map(lambda w: Row(word=w, cnt=1)).toDF()
word_counts = words_df.groupBy("word").sum()
於是,我第一次看到從csv
數據到一個數據幀df
後來才知道有:
val title_words = df.select("title").flatMap { row =>
row.getAs[String("title").split(" ") }
val title_words_df = title_words.map(w => Row(w,1)).toDF()
val word_counts = title_words_df.groupBy("word").sum()
,但我不知道:
如何將字段名稱分配到行中的行開頭與VAL title_words_df nning = ...
我有錯誤 「的值toDF不是org.apache.spark.rdd.RDD [org.apache.spark.sql.Row]成員」
在此先感謝您的幫助。