2015-06-10 65 views
4

我需要使用sql方法在SparkSQL中嵌套結構的幫助。我創建了這樣的結構在現有RDD(dataRDD)的頂部上的數據幀:SparkSQL - 訪問嵌套結構Row(field1,field2 = Row(..))

schema=StructType([ StructField("m",LongType()) , 
        StructField("field2", StructType([ 
        StructField("st",StringType()), 
        StructField("end",StringType()), 
        StructField("dr",IntegerType()) ])) 
        ]) 

printSchema()返回此:

root 
|-- m: long (nullable = true) 
|-- field2: struct (nullable = true) 
| |-- st: string (nullable = true) 
| |-- end: string (nullable = true) 
| |-- dr: integer (nullable = true) 

創建從數據RDD數據幀和應用了架構運作良好。

df= sqlContext.createDataFrame(dataRDD, schema) 
df.registerTempTable("logs") 

但檢索數據是不工作:

res = sqlContext.sql("SELECT m, field2.st FROM logs") # <- This fails 

...org.apache.spark.sql.AnalysisException: cannot resolve 'field.st' given input columns msisdn, field2; 

res = sqlContext.sql("SELECT m, field2[0] FROM logs") # <- Also fails 
...org.apache.spark.sql.AnalysisException: unresolved operator 'Project [field2#1[0] AS c0#2]; 

res = sqlContext.sql("SELECT m, st FROM logs") # <- Also not working 
...cannot resolve 'st' given input columns m, field2; 

所以,我怎麼能訪問嵌套結構的SQL語法? 感謝

回答

3

你有別的事情發生在你的測試,因爲field2.st是正確的語法:

case class field2(st: String, end: String, dr: Int) 

val schema = StructType(
    Array(
    StructField("m",LongType), 
    StructField("field2", StructType(Array(
     StructField("st",StringType), 
     StructField("end",StringType), 
     StructField("dr",IntegerType) 
    ))) 
) 
) 

val df2 = sqlContext.createDataFrame(
    sc.parallelize(Array(Row(1,field2("this","is",1234)),Row(2,field2("a","test",5678)))), 
    schema 
) 

/* df2.printSchema 
root 
|-- m: long (nullable = true) 
|-- field2: struct (nullable = true) 
| |-- st: string (nullable = true) 
| |-- end: string (nullable = true) 
| |-- dr: integer (nullable = true) 
*/ 

val results = sqlContext.sql("select m,field2.st from df2") 

/* results.show 
m st 
1 this 
2 a 
*/ 

回頭看看你的錯誤信息:cannot resolve 'field.st' given input columns msisdn, field2 - fieldfield2。再次檢查您的代碼 - 名稱不對齊。

+0

是的,這是一個錯字。但仍然field2.st引發此錯誤: 18:19:05 WARN TaskSetManager:在階段1.0(TID 2,BICHDP2.TD)中丟失的任務1.0:java.lang.ClassCastException:java.util.ArrayList不能轉換爲組織。 apache.spark.sql.Row at org.apache.spark.sql.catalyst.expressions.StructGetField.eval(complexTypes.scala:93) at org.apache.spark.sql.catalyst.expressions.Alias.eval(namedExpressions .scala:113) at org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection.apply(Projection.scala:68) at – frengel

相關問題