我試圖通過api將文件(json.txt)從雲存儲導入到Bigquery,並引發錯誤。當這是通過網絡用戶界面完成,它的工作原理和沒有錯誤(我甚至設置maxBadRecords = 0)。有人能告訴我我在這裏做錯了嗎?代碼是否錯誤,或者我需要在某處更改BigQuery中的某些設置?如何將json從雲存儲上的文件導入Bigquery
該文件是一個純文本utf-8文件,其內容如下:我保留在bigquery和json導入的文檔。
{"person_id":225,"person_name":"John","object_id":1}
{"person_id":226,"person_name":"John","object_id":1}
{"person_id":227,"person_name":"John","object_id":null}
{"person_id":229,"person_name":"John","object_id":1}
並且在導入作業時拋出以下錯誤:「值無法轉換爲預期類型。」爲每一行。
{
"reason": "invalid",
"location": "Line:15/Field:1",
"message": "Value cannot be converted to expected type."
},
{
"reason": "invalid",
"location": "Line:16/Field:1",
"message": "Value cannot be converted to expected type."
},
{
"reason": "invalid",
"location": "Line:17/Field:1",
"message": "Value cannot be converted to expected type."
},
{
"reason": "invalid",
"location": "Line:18/Field:1",
"message": "Value cannot be converted to expected type."
},
{
"reason": "invalid",
"message": "Too many errors encountered. Limit is: 10."
}
]
},
"statistics": {
"creationTime": "1384484132723",
"startTime": "1384484142972",
"endTime": "1384484182520",
"load": {
"inputFiles": "1",
"inputFileBytes": "960",
"outputRows": "0",
"outputBytes": "0"
}
}
}
的文件可以在這裏訪問: http://www.sendspace.com/file/7q0o37
和是我的代碼和架構如下:
def insert_and_import_table_in_dataset(tar_file, table, dataset=DATASET)
config= {
'configuration'=> {
'load'=> {
'sourceUris'=> ["gs://test-bucket/#{tar_file}"],
'schema'=> {
'fields'=> [
{ 'name'=>'person_id', 'type'=>'INTEGER', 'mode'=> 'nullable'},
{ 'name'=>'person_name', 'type'=>'STRING', 'mode'=> 'nullable'},
{ 'name'=>'object_id', 'type'=>'INTEGER', 'mode'=> 'nullable'}
]
},
'destinationTable'=> {
'projectId'=> @project_id.to_s,
'datasetId'=> dataset,
'tableId'=> table
},
'sourceFormat' => 'NEWLINE_DELIMITED_JSON',
'createDisposition' => 'CREATE_IF_NEEDED',
'maxBadRecords'=> 10,
}
},
}
result = @client.execute(
:api_method=> @bigquery.jobs.insert,
:parameters=> {
#'uploadType' => 'resumable',
:projectId=> @project_id.to_s,
:datasetId=> dataset},
:body_object=> config
)
# upload = result.resumable_upload
# @client.execute(upload) if upload.resumable?
puts result.response.body
json = JSON.parse(result.response.body)
while true
job_status = get_job_status(json['jobReference']['jobId'])
if job_status['status']['state'] == 'DONE'
puts "DONE"
return true
else
puts job_status['status']['state']
puts job_status
sleep 5
end
end
end
可能有人請告訴我,我做錯了什麼?我該如何解決和在哪裏?
同樣在未來的某個時間點,我期望使用壓縮文件並從它們導入 - 是「tar.gz」,還是需要將其設置爲「.gz」?
非常感謝您的幫助。欣賞它。
謝謝喬丹。就是這樣。它現在工作正常,導入到bigquery中。真的很感謝所有的幫助!祝您有美好的一天。我將用更多的配置詳細信息更新問題,以便其他人可以使用它。 – user2989892