avro架構中的可選陣列

我想知道是否有可能有一個可選的數組。讓我們假設一個架構是這樣的：avro架構中的可選陣列

{ 
    "type": "record", 
    "name": "test_avro", 
    "fields" : [ 
     {"name": "test_field_1", "type": "long"}, 
     {"name": "subrecord", "type": [{ 
     "type": "record", 
     "name": "subrecord_type", 
      "fields":[{"name":"field_1", "type":"long"}] 
      },"null"] 
    }, 
    {"name": "simple_array", 
    "type":{ 
     "type": "array", 
     "items": "string" 
     } 
    } 
    ] 
}

嘗試寫沒有「simple_array」將導致在datafilewriter一個NPE的Avro的記錄。對於子記錄它只是罰款，但是當我嘗試了數組定義爲可選：

{"name": "simple_array", 
"type":[{ 
    "type": "array", 
    "items": "string" 
    }, "null"]

它不會導致NPE，但運行時異常：

AvroRuntimeException: Not an array schema: [{"type":"array","items":"string"},"null"]

感謝。

來源

2012-02-23 Philipp Pahl

我想你想要的這裏是工會零和數組：

{ 
    "type":"record", 
    "name":"test_avro", 
    "fields":[{ 
      "name":"test_field_1", 
      "type":"long" 
     }, 
     { 
      "name":"subrecord", 
      "type":[{ 
        "type":"record", 
        "name":"subrecord_type", 
        "fields":[{ 
          "name":"field_1", 
          "type":"long" 
         } 
        ] 
       }, 
       "null" 
      ] 
     }, 
     { 
      "name":"simple_array", 
      "type":["null", 
       { 
        "type":"array", 
        "items":"string" 
       } 
      ], 
      "default":null 
     } 
    ] 
}

當我使用與Python示例數據上面的架構，這裏的結果（schema_string是上面的JSON字符串）：

>>> from avro import io, datafile, schema 
>>> from json import dumps 
>>> 
>>> sample_data = {'test_field_1':12L} 
>>> rec_schema = schema.parse(schema_string) 
>>> rec_writer = io.DatumWriter(rec_schema) 
>>> rec_reader = io.DatumReader() 
>>> 
>>> # write avro file 
... df_writer = datafile.DataFileWriter(open("/tmp/foo", 'wb'), rec_writer, writers_schema=rec_schema) 
>>> df_writer.append(sample_data) 
>>> df_writer.close() 
>>> 
>>> # read avro file 
... df_reader = datafile.DataFileReader(open('/tmp/foo', 'rb'), rec_reader) 
>>> print dumps(df_reader.next()) 
{"simple_array": null, "test_field_1": 12, "subrecord": null}

來源

2012-03-31 12:30:41 kojiro

與java列表有同樣的問題，你的答案已經解決了我的問題。謝謝！ – forhas 2013-10-21 14:03:46

我得到同樣的錯誤。在我的設置中，我試圖使用MapReduce Java程序處理Avro文件。這項工作是成功的。數據管道的下一階段是在轉換後的數據上創建一個配置單元表（avroSerde）。該表也成功創建，但是當我嘗試使用hql查詢表（這反過來執行mapreduce作業）時，作業失敗與「錯誤：java.lang.RuntimeException：org.apache.hadoop.hive.ql.metadata.HiveException：處理可寫時Hive運行時錯誤」 – venBigData 2016-04-25 23:16:24

avro架構中的可選陣列

回答

相關問題