BigQuery資料表在Using the BigQuery Connector with Spark什麼樣的RDDS的火花可以保存到使用saveAsNewAPIHadoopDataset
// Perform word count.
val wordCounts = (tableData
.map(entry => convertToTuple(entry._2))
.reduceByKey(_ + _))
// Write data back into a new BigQuery table.
// IndirectBigQueryOutputFormat discards keys, so set key to null.
(wordCounts
.map(pair => (null, convertToJson(pair)))
.saveAsNewAPIHadoopDataset(conf))
的例子,如果我刪除.reduceByKey(_ + _)
一部分,那麼我將有以下錯誤
組織.apache.spark.SparkException:作業中止。 at org.apache.spark.internal.io.SparkHadoopMapReduceWriter $ .write(SparkHadoopMapReduceWriter.scala:107) at org.apache.spark.rdd.PairRDDFunctions $$ anonfun $ saveAsNewAPIHadoopDataset $ 1.apply $ mcV $ sp(PairRDDFunctions.scala :1085) at org.apache.spark.rdd.PairRDDFunctions $$ anonfun $ saveAsNewAPIHadoopDataset $ 1.apply(PairRDDFunctions.scala:1085) at org.apache.spark.rdd.PairRDDFunctions $$ anonfun $ saveAsNewAPIHadoopDataset $ 1.apply(PairRDDFunctions .scala:1085) at org.apache.spark.rdd.RDDOperationScope $ .withScope(RDDOperationScope.scala:151) at org.apache.spark.rdd.RDDOperationScope $ .withScope(RDDOperationScope.scala:112) at org .apache.spark.rdd.RDD.withScope(RDD.scala:362) at org.apache.spark.rdd.PairRDDFunctions.saveAsNewAPIHadoopDataset(Pa irRDDFunctions.scala:1084) ... 46已忽略 導致:java.io.IOException:Schema沒有字段。表:test_output_40b400dc_1bfe_454a_9aa8_bf9562d54c3f_source 在com.google.cloud.hadoop.io.bigquery.BigQueryUtils.waitForJobCompletion(BigQueryUtils.java:95) 在com.google.cloud.hadoop.io.bigquery.BigQueryHelper.importFromGcs(BigQueryHelper.java:164 ) 在com.google.cloud.hadoop.io.bigquery.output.IndirectBigQueryOutputCommitter.commitJob(IndirectBigQueryOutputCommitter.java:57) 在org.apache.spark.internal.io.HadoopMapReduceCommitProtocol.commitJob(HadoopMapReduceCommitProtocol.scala:128) 在org.apache.spark.internal.io.SparkHadoopMapReduceWriter $ .WRITE(SparkHadoopMapReduceWriter.scala:101) ...... 53多個
在一些C ases,我不使用reduceByKey,並想將我的RDD保存在BigQuery中。
可以添加完整的錯誤?是因爲你所做的代碼改變了嗎? – mrsrinivas
是的,錯誤只發生在我的代碼更改後。 – bignano