4
我需要使用sql方法在SparkSQL中嵌套結構的幫助。我創建了這樣的結構在現有RDD(dataRDD)的頂部上的數據幀:SparkSQL - 訪問嵌套結構Row(field1,field2 = Row(..))
schema=StructType([ StructField("m",LongType()) ,
StructField("field2", StructType([
StructField("st",StringType()),
StructField("end",StringType()),
StructField("dr",IntegerType()) ]))
])
printSchema()返回此:
root
|-- m: long (nullable = true)
|-- field2: struct (nullable = true)
| |-- st: string (nullable = true)
| |-- end: string (nullable = true)
| |-- dr: integer (nullable = true)
創建從數據RDD數據幀和應用了架構運作良好。
df= sqlContext.createDataFrame(dataRDD, schema)
df.registerTempTable("logs")
但檢索數據是不工作:
res = sqlContext.sql("SELECT m, field2.st FROM logs") # <- This fails
...org.apache.spark.sql.AnalysisException: cannot resolve 'field.st' given input columns msisdn, field2;
res = sqlContext.sql("SELECT m, field2[0] FROM logs") # <- Also fails
...org.apache.spark.sql.AnalysisException: unresolved operator 'Project [field2#1[0] AS c0#2];
res = sqlContext.sql("SELECT m, st FROM logs") # <- Also not working
...cannot resolve 'st' given input columns m, field2;
所以,我怎麼能訪問嵌套結構的SQL語法? 感謝
是的,這是一個錯字。但仍然field2.st引發此錯誤: 18:19:05 WARN TaskSetManager:在階段1.0(TID 2,BICHDP2.TD)中丟失的任務1.0:java.lang.ClassCastException:java.util.ArrayList不能轉換爲組織。 apache.spark.sql.Row at org.apache.spark.sql.catalyst.expressions.StructGetField.eval(complexTypes.scala:93) at org.apache.spark.sql.catalyst.expressions.Alias.eval(namedExpressions .scala:113) at org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection.apply(Projection.scala:68) at – frengel