2
我是Python Spark的新手。下面我有火花數據幀& JSON對象Spark RDD的模式定義
df = sqlContext.read.load("result.json", format="json")
JSON對象:
df.collect()
[Row(Dorothy=[u'CA', u'F', u'1910', u'220'], Frances=[u'CA', u'F', u'1910', u'134'], Helen=[u'CA', u'F', u'1910', u'239'], Margaret=[u'CA', u'F', u'1910', u'163'], Mary=[u'CA', u'F', u'1910', u'295'])]
當我嘗試字段名分配到上述數值
df.select(Row("Name" =["State","Gender","Year","Count"])).write.save("result.json",format = 'json')
我收到錯誤,提示以下錯誤:
。你能幫到如何定義架構的datafrmae
Py4JError: An error occurred while calling z:org.apache.spark.sql.functions.col. Trace:py4j.Py4JException: Method col([class java.util.ArrayList]) does not exist