Pyspark不能創建在蜂巢

許多搜索拼花表指向pyspark代碼來創建的蜂巢metastore表的東西，如：Pyspark不能創建在蜂巢

hivecx.sql("...create table syntax that matches the dataframe...") df.write.mode("overwrite").partitionBy('partition_colname').insertInto("national_dev.xh_claimline")

我試着寫/保存/ INSERTINTO和模式的許多變化，但總是：在Hadoop中

Caused by: java.io.FileNotFoundException: File does not exist: /user/hive/warehouse/national_dev.db/xh_claimline/000000_0

表目錄存在，但000000_0子目錄（S）沒有。我認爲這是因爲桌子是空的，我還沒有寫信給它。

hadoop fs -ls /user/hive/warehouse/national_dev.db/xh_claimline Found 2 items drwxrwxrwt - mryan hive 0 2017-03-20 12:26 /user/hive/warehouse/national_dev.db/xh_claimline/.hive-staging_hive_2017-03-20_12-26-35_382_2703713921168172595-1 drwxrwxrwt - mryan hive 0 2017-03-20 12:29 /user/hive/warehouse/national_dev.db/xh_claimline/.hive-staging_hive_2017-03-20_12-29-40_775_73045420253990110-1

在Cloudera公司，星火版本： 17/03/20 11時45分21秒INFO spark.SparkContext：運行星火1.6.0版本

來源

2017-03-20 Matt Ryan

看着INSERT INTO說法，這裏數據寫使用overwrite模式，則不需要寫入插入。直接使用saveAsTable與parquet格式。這裏是修改聲明： -

df = hivecx.sql("...create table syntax that matches the dataframe...") 
df.write.mode("overwrite").format("parquet").partitionBy('partition_colname').saveAsTable("national_dev.xh_claimline")

來源

2017-03-21 09:18:30

謝謝@ rakesh-kumar - 我曾嘗試過，但我現在再試一次，以確保。我得到完全相同的結果。 '引起：java.io.FileNotFoundException：文件不存在：/ user/hive/warehouse/national_dev.db/xh_claimline/000000_0' –

@MattRyan然後我認爲你沒有名爲national_dev的數據庫，所以確保數據庫存在通過蜂巢殼 –

Pyspark不能創建在蜂巢

回答

相關問題