2017-10-04 17 views
0

是否有一種常見方法來更改任何指定StructType的所有元素的可空屬性?它可能是嵌套的StructType。在Scala中更改Spark sql StructType的所有元素的可空屬性的常用方法

我看到@eliasah標記爲Spark Dataframe column nullable property change重複。但它們不同,因爲它無法解決層次結構/嵌套StructType,該答案僅適用於一個級別。

例如:

root 
|-- user_id: string (nullable = false) 
|-- name: string (nullable = false) 
|-- system_process: array (nullable = false) 
| |-- element: struct (containsNull = false) 
| | |-- timestamp: long (nullable = false) 
| | |-- process: string (nullable = false) 
|-- type: string (nullable = false) 
|-- user_process: array (nullable = false) 
| |-- element: struct (containsNull = false) 
| | |-- timestamp: long (nullable = false) 
| | |-- process: string (nullable = false) 

我想改變nullalbe到真正的所有元素,結果應該是:

root 
|-- user_id: string (nullable = true) 
|-- name: string (nullable = true) 
|-- system_process: array (nullable = true) 
| |-- element: struct (containsNull = true) 
| | |-- timestamp: long (nullable = true) 
| | |-- process: string (nullable = true) 
|-- type: string (nullable = true) 
|-- user_process: array (nullable = true) 
| |-- element: struct (containsNull = true) 
| | |-- timestamp: long (nullable = true) 
| | |-- process: string (nullable = true) 

附上StructType的JSON模式爲了方便測試的樣本:

val jsonSchema="""{"type":"struct","fields":[{"name":"user_id","type":"string","nullable":false,"metadata":{}},{"name":"name","type":"string","nullable":false,"metadata":{}},{"name":"system_process","type":{"type":"array","elementType":{"type":"struct","fields":[{"name":"timestamp","type":"long","nullable":false,"metadata":{}},{"name":"process_id","type":"string","nullable":false,"metadata":{}}]},"containsNull":false},"nullable":false,"metadata":{}},{"name":"type","type":"string","nullable":false,"metadata":{}},{"name":"user_process","type":{"type":"array","elementType":{"type":"struct","fields":[{"name":"timestamp","type":"long","nullable":false,"metadata":{}},{"name":"process_id","type":"string","nullable":false,"metadata":{}}]},"containsNull":false},"nullable":false,"metadata":{}}]}""" 
DataType.fromJson(jsonSchema).asInstanceOf[StructType].printTreeString() 

回答

0

最後算出兩個解決方案如下:

  1. 招數一先替換字符串,然後從JSON字符串

    DataType.fromJson(schema.json.replaceAll("\"nullable\":false", "\"nullable\":true")).asInstanceOf[StructType] 
    
  2. Recurisive方法創建StructType例如

    def updateFieldsToNullable(structType: StructType): StructType = { 
        StructType(structType.map(f => f.dataType match { 
         case d: ArrayType => 
         val element = d.elementType match { 
          case s: StructType => updateFieldsToNullable(s) 
          case _ => d.elementType 
         } 
         f.copy(nullable = true, dataType = ArrayType(element, d.containsNull)) 
         case s: StructType => f.copy(nullable = true, dataType = updateFieldsToNullable(s)) 
         case _ => f.copy(nullable = true) 
        }) 
        ) 
        } 
    
相關問題