0
我有一個數據集與String [],我很努力從中提取列。這裏的代碼無法從數據框中提取數組/列表,AnalysisException:需要結構類型,但得到二進制
import static org.apache.spark.sql.functions.col;
//Read parquet data
Dataset<Row> readerDF = spark.readStream().format("parquet").
List<String> columns = Arrays.asList("city","country");
//Interested in only field in data for now 'fieldMap' which is Map<String,String>
Dataset<String[]> stringArrDF = readerDF.map((MapFunction<Row, String[]>) row -> {
Map<String,String> fields = row.getJavaMap(row.fieldIndex("fieldMap"));
List<String> columnList = new ArrayList<>();
columns.forEach(columnName ->
{
columnList.add(fields.getOrDefault(columnName, ""));
});
return columnList.toArray(new String[columns.size]);
}, Encoders.kryo(String[].class));
//I was expecting to extract city here:
Dataset ds = stringArrDF.select(col("value").getItem(1).as("city"));
但它失敗,下面的例外。
線程「main」中的異常org.apache.spark.sql.AnalysisException: 無法從值#22提取值;
如何從數據集訪問字符串[]或列表字段?
我如何才能使它與Kryo一起工作?或者它只是用於自定義數據類型?任何好的教程沿着這些線? –
你可以參考http://spark.apache.org/docs/latest/sql-programming-guide.html#creating-datasets和https://jaceklaskowski.gitbooks.io/mastering-apache-spark/spark-sql -Encoder.html – abaghel