使用python腳本作爲使用配置單元的還原器加載映射數據類型

在其中一列Hive表中，我想要存儲鍵值對。 Hive的複雜數據類型映射支持該構造。使用python腳本作爲使用配置單元的還原器加載映射數據類型

（這只是我希望能夠做的，我有許多我要壓縮這樣的多列的玩具爲例）

所以我創建這樣一個表：

hive>DESCRIBE transaction_detailed; 
OK 
id STRING 
time STRING 
Time taken: 0.181 seconds 

hive>DROP TABLE IF EXISTS transactions; 
hive>CREATE EXTERNAL TABLE transactions(
    id STRING, 
    time_map MAP<STRING, INT> 
    ) 
partitioned by (dt string) 
row format delimited fields terminated by '\t' collection items terminated by ',' map keys terminated by ':' lines terminated by '\n' 
location 's3://my_loaction/transactions/';

然後我嘗試使用reducer加載地圖列，如代碼中所述：time_map的結構看起來像這樣：{「min」：time，「max」：time，「average」：time ，「total」：time}

hive>FROM(FROM transaction_detailed 
MAP transaction_detailed.id, transaction_detailed.time 
USING "python unity mapper -- splits the same thing out as it takes it" 
AS id, time 
cluster by id) transaction_time_map 
insert overwrite table transactions partition(dt="2013-27-03") 
REDUCE transaction_time_map.id, transaction_time_map.time 
USING "python reducer which takes time_stamp sequence for a single id and summarizes them using min, max, average and total and supposed to insert into map" 
as id, time_map;

但我得到這樣一個錯誤：

FAILED: Error in semantic analysis: Line 6:23 Cannot insert into target table because column number/types are different "two_day": Cannot convert column 8 from string to map<string,int>.

如何加載使用我的Python減速地圖列？

來源

2013-03-27 darshan

我認爲上述問題的答案是在配置單元中使用str_to_map(text[, delimiter1, delimiter2])函數。

來源

2013-03-28 03:39:44 darshan

使用python腳本作爲使用配置單元的還原器加載映射數據類型

回答

相關問題