1
我有我想用用獲取星火數據集嵌套數組的最小值
Dataset<Row> df = spark.read().json(args[0]);
星火2.2.0和Java API,這是我轉換成一個數據集。然後分析一個JSON服務器的日誌文件,它生成以下模式:
df.printschema();
root
|-- timestamp: long (nullable = true)
|-- results: struct (nullable = true)
| |-- entities: array (nullable = true)
| | |-- element: struct (containsNull = true)
| | | |-- entity_id: string (nullable = true)
| | | |-- score: long (nullable = true)
| | | |-- is_available: boolean (nullable = true)
| |-- number_of_results: long (nullable = true)
我想得分最低的實體,這是可用的,所以我會得到一個數據集類似於:
root
|-- timestamp: long (nullable = true)
|-- results: struct (nullable = true)
| |-- entity: struct (containsNull = true)
| | |-- entity_id: string (nullable = true)
| | |-- score: long (nullable = true)
| | |-- is_available: boolean (nullable = true)
我該如何做這個轉變?