在Google Cloud Dataproc中運行Spark作業。使用BigQuery Connector將從作業輸出的json數據加載到BigQuery表中。Spark BigQuery連接器:寫入ARRAY類型會導致異常:「」ARRAY的值無效「
BigQuery Standard-SQL data types documentation指出支持ARRAY類型。
我的Scala代碼是:
val outputDatasetId = "mydataset"
val tableSchema = "["+
"{'name': '_id', 'type': 'STRING'},"+
"{'name': 'array1', 'type': 'ARRAY'},"+
"{'name': 'array2', 'type': 'ARRAY'},"+
"{'name': 'number1', 'type': 'FLOAT'}"+
"]"
// Output configuration
BigQueryConfiguration.configureBigQueryOutput(
conf, projectId, outputDatasetId, "outputTable",
tableSchema)
//Write visits to BigQuery
jsonData.saveAsNewAPIHadoopDataset(conf)
但作業引發此異常:
{
"code" : 400,
"errors" : [ {
"domain" : "global",
"message" : "Invalid value for: ARRAY is not a valid value",
"reason" : "invalid"
} ],
"message" : "Invalid value for: ARRAY is not a valid value"
}
at
com.google.cloud.hadoop.util.AbstractGoogleAsyncWriteChannel.waitForCompletionAnThrowIfUploadFailed(AbstractGoogleAsyncWriteChannel.java:432)
at com.google.cloud.hadoop.util.AbstractGoogleAsyncWriteChannel.close(AbstractGoogleAsyncWriteChannel.java:287)
at com.google.cloud.hadoop.io.bigquery.BigQueryRecordWriter.close(BigQueryRecordWriter.java:358)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1$$anonfun$12$$anonfun$apply$5.apply$mcV$sp(PairRDDFunctions.scala:1124)
at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1366)
... 8 more
Caused by: com.google.api.client.googleapis.json.GoogleJsonResponseException:
400 Bad Request
這是一個傳統與標準SQL的問題?或者,Spark的BigQuery Connector不支持ARRAY類型?
完美。解決了。感謝複製/粘貼代碼塊。 –