2017-01-10 111 views
1

我具有由日期和產品類型蜂巢 - 更新分配柱

product_id, sale_id, date, product_type 
42342423, 43423, 2017-01-01, S 
67867868, 23233, 2017-01-01, C 
53453466, 63423, 2017-02-01, S 

我需要從「S」到「T」(恤到上衣)更新PRODUCT_TYPE的所有值進行分區的配置單元表。我們的Hive版本不支持直接更新。

其他的解決方案張貼類似這樣涉及創建一個新表,並使用insert overwritecase陳述 - 像

INSERT OVERWRITE TABLE data.textile_sales PARTITION(date='2017-01-01') 
select product_id, sale_id, case when product_type = 'S' then 'T' end as product_type, date 

但如果要更新的列是一個分區這是行不通的。

有沒有其他方法可以解決這個問題?

+0

附: 'case product_type ='S',那麼'T'結束爲product_type' - 這個表達式對於product_type不是'S'將導致NULL –

回答

0

分區列「data」實際上是與目錄相關的元數據。
如果您已經有'T'文件夾,然後將文件從其當前日期+ product_type ='S'文件夾移動到相應的日期+ product_type ='T'文件夾。
如果您沒有'T'文件夾,您可以簡單地重命名'S'文件夾並更新分區列表。


演示

hive> select * from product; 
OK 
67867868 23233 2017-01-01 C 
42342423 43423 2017-01-01 S 
53453466 63423 2017-01-02 S 

[[email protected] ~]$ hdfs dfs -ls -R /user/hive/warehouse/product 
drwxrwxrwx - training hive   0 2017-01-10 13:35 /user/hive/warehouse/product/date=2017-01-01 
drwxrwxrwx - training hive   0 2017-01-10 13:36 /user/hive/warehouse/product/date=2017-01-01/product_type=C 
-rwxrwxrwx 1 training hive   15 2017-01-10 13:36 /user/hive/warehouse/product/date=2017-01-01/product_type=C/000000_0 
drwxrwxrwx - training hive   0 2017-01-10 13:35 /user/hive/warehouse/product/date=2017-01-01/product_type=S 
-rwxrwxrwx 1 training hive   15 2017-01-10 13:35 /user/hive/warehouse/product/date=2017-01-01/product_type=S/000000_0 
drwxrwxrwx - training hive   0 2017-01-10 13:36 /user/hive/warehouse/product/date=2017-01-02 
drwxrwxrwx - training hive   0 2017-01-10 13:36 /user/hive/warehouse/product/date=2017-01-02/product_type=S 
-rwxrwxrwx 1 training hive   15 2017-01-10 13:36 /user/hive/warehouse/product/date=2017-01-02/product_type=S/000000_0 

[[email protected] ~]$ hdfs dfs -mkdir /user/hive/warehouse/product/date=2017-01-01/product_type=T 
[[email protected] ~]$ hdfs dfs -mkdir /user/hive/warehouse/product/date=2017-01-02/product_type=T 
[[email protected] ~]$ hdfs dfs -mv /user/hive/warehouse/product/date=2017-01-01/product_type=S/000000_0 /user/hive/warehouse/product/date=2017-01-01/product_type=T/000000_0 
[[email protected] ~]$ hdfs dfs -mv /user/hive/warehouse/product/date=2017-01-02/product_type=S/000000_0 /user/hive/warehouse/product/date=2017-01-02/product_type=T/000000_0 

[[email protected] ~]$ hdfs dfs -ls -R /user/hive/warehouse/product 
drwxrwxrwx - training hive   0 2017-01-10 13:41 /user/hive/warehouse/product/date=2017-01-01 
drwxrwxrwx - training hive   0 2017-01-10 13:36 /user/hive/warehouse/product/date=2017-01-01/product_type=C 
-rwxrwxrwx 1 training hive   15 2017-01-10 13:36 /user/hive/warehouse/product/date=2017-01-01/product_type=C/000000_0 
drwxrwxrwx - training hive   0 2017-01-10 13:42 /user/hive/warehouse/product/date=2017-01-01/product_type=S 
drwxrwxrwx - training hive   0 2017-01-10 13:42 /user/hive/warehouse/product/date=2017-01-01/product_type=T 
-rwxrwxrwx 1 training hive   15 2017-01-10 13:35 /user/hive/warehouse/product/date=2017-01-01/product_type=T/000000_0 
drwxrwxrwx - training hive   0 2017-01-10 13:41 /user/hive/warehouse/product/date=2017-01-02 
drwxrwxrwx - training hive   0 2017-01-10 13:42 /user/hive/warehouse/product/date=2017-01-02/product_type=S 
drwxrwxrwx - training hive   0 2017-01-10 13:42 /user/hive/warehouse/product/date=2017-01-02/product_type=T 
-rwxrwxrwx 1 training hive   15 2017-01-10 13:36 /user/hive/warehouse/product/date=2017-01-02/product_type=T/000000_0 

hive> msck repair table product; 
OK 
Partitions not in metastore: product:date=2017-01-01/product_type=T product:date=2017-01-02/product_type=T 
Repair: Added partition to metastore product:date=2017-01-01/product_type=T 
Repair: Added partition to metastore product:date=2017-01-02/product_type=T 
Time taken: 0.409 seconds, Fetched: 3 row(s) 

hive> select * from product; 
OK 
67867868 23233 2017-01-01 C 
42342423 43423 2017-01-01 T 
53453466 63423 2017-01-02 T 
+0

你還在嗎? –

+0

我沒有訪問權限來更改數據文件,因此我最終使用新值重新加載了受影響的數據 – Craig