2015-08-09 60 views
1

我想在兩個字段中乘以兩個字段並在Pig中加入三個表後取其總和。不過,我不斷獲取此錯誤:在PIG中加入數據後的多重編碼

<file loyalty_program.pig, line 30, column 74> (Name: Multiply Type: null Uid: null)incompatible types in Multiply Operator left hand side:bag :tuple(new_details1::new_details::potential_customers::num_of_orders:long) right hand side:bag :tuple(products::price:int)

-- load the data sets 
orders = LOAD '/dualcore/orders' AS (order_id:int, 
      cust_id:int, 
      order_dtm:chararray); 

details = LOAD '/dualcore/order_details' AS (order_id:int, 
      prod_id:int); 

products = LOAD '/dualcore/products' AS (prod_id:int, 
      brand:chararray, 
      name:chararray, 
      price:int, 
      cost:int, 
      shipping_wt:int); 
recent = FILTER orders by order_dtm matches '2012-.*$'; 

customer = GROUP recent by cust_id; 

cust_orders = FOREACH customer GENERATE group as cust_id, (int)COUNT(recent) as num_of_orders; 

potential_customers = FILTER cust_orders by num_of_orders>=5; 

new_details = join potential_customers by cust_id, recent by cust_id; 
new_details1 = join new_details by order_id, details by order_id; 
new_details2 = join new_details1 by prod_id, products by prod_id; 
--DESCRIBE new_details2; 

final_details = FOREACH new_details2 GENERATE potential_customers::cust_id, potential_customers::num_of_orders as num_of_orders,recent::order_id as order_id,recent::order_dtm,details::prod_id,products::brand,products::name,products::price as price,products::cost,products::shipping_wt; 

grouped_data = GROUP final_details by cust_id; 

member = FOREACH grouped_data GENERATE SUM(final_details.num_of_orders * final_details.price) ; 
lim = limit member 10; 
dump lim; 

我甚至鑄造計數的結果爲int。它仍然不斷向我拋出這個錯誤。我不知道如何去做。

+0

確定。你可以給我這裏的要求..請給出一些示例輸入和預期輸出..我想知道,一個order_id可以有多個prod_id或不.. –

回答

0

好吧..我認爲,首先,要乘以節數購買,每個產品的價格,然後你需要的乘積值的總和..

即使這是一個奇怪的規定,但是你可以通過下面的方法去..

所有你需要做的是計算乘法final_details foreach語句本身和簡單地套用SUM爲倍增量。基於您的負載陳述

我創建的低於輸入文件

main_orders.txt

6666,100,2012-01-01 
7777,101,2012-09-02 
8888,100,2012-01-09 
9999,101,2012-12-08 
6666,101,2012-09-02 
9999,100,2012-07-12 
9999,100,2012-08-01 
6666,100,2012-01-02 
7777,100,2012-09-09 

orders_details.txt

6666,6000 
7777,7000 
8888,8000 
9999,9000 

main_products.txt

6000,Nike,Shoes,3000,3000,1 
7000,Adidas,Cap,1000,1000,1 
8000,Rebook,Shoes,4000,4000,1 
9000,Puma,Shoes,25000,2500,1 

下面是代碼

orders = LOAD '/user/cloudera/inputfiles/main_orders.txt' USING PigStorage(',') AS (order_id:int,cust_id:int,order_dtm:chararray); 

details = LOAD '/user/cloudera/inputfiles/orders_details.txt' USING PigStorage(',') AS (order_id:int,prod_id:int); 

products = LOAD '/user/cloudera/inputfiles/main_products.txt' USING PigStorage(',') AS(prod_id:int,brand:chararray,name:chararray,price:int,cost:int,shipping_wt:int); 

recent = FILTER orders by order_dtm matches '2012-.*'; 

customer = GROUP recent by cust_id; 

cust_orders = FOREACH customer GENERATE group as cust_id, (int)COUNT(recent) as num_of_orders; 


potential_customers = FILTER cust_orders by num_of_orders>=5; 

new_details = join potential_customers by cust_id, recent by cust_id; 
new_details1 = join new_details by order_id, details by order_id; 
new_details2 = join new_details1 by prod_id, products by prod_id; 
DESCRIBE new_details2; 

final_details = FOREACH new_details2 GENERATE potential_customers::cust_id, potential_customers::num_of_orders as num_of_orders,recent::order_id as order_id,recent::order_dtm,details::prod_id,products::brand,products::name,products::price as price,products::cost,products::shipping_wt, (potential_customers::num_of_orders * products::price) as multiplied_price;// multiplication is achived in last variable 
dump final_details; 

grouped_data = GROUP final_details by cust_id; 

member = FOREACH grouped_data GENERATE SUM(final_details.multiplied_price) ; 
lim = limit member 10; 
dump lim; 

只是爲了清楚起見,我轉儲還有final_details foreach語句的輸出。

(100,6,6666,2012-01-01,6000,Nike,Shoes,3000,3000,1,18000) 
(100,6,6666,2012-01-02,6000,Nike,Shoes,3000,3000,1,18000) 
(100,6,7777,2012-09-09,7000,Adidas,Cap,1000,1000,1,6000) 
(100,6,8888,2012-01-09,8000,Rebook,Shoes,4000,4000,1,24000) 
(100,6,9999,2012-07-12,9000,Puma,Shoes,25000,2500,1,150000) 
(100,6,9999,2012-08-01,9000,Puma,Shoes,25000,2500,1,150000) 

最終輸出低於

(366000) 

此代碼可以幫助你,但請澄清您的要求再次