所有負載首先所有的數據認爲這是桌子
cust_data = LOAD '\your\path\to\customer\data' USING PigStorage() as (uniqueId: int, customerId: int, name: chararray);
store_data = LOAD '\your\path\to\store\data' USING PigStorage() as (uniqueId: int, storeNum: int, name: chararray);
product_data = LOAD '\your\path\to\product\data' USING PigStorage() as (uniqueId: int, sku: int, productName: chararray);
您可以通過
DESCRIBE cust_data;
DESCRIBE store_data;
DESCRIBE product_data;
檢查加載的數據架構聯接先使用UNIQUEID客戶和存儲數據(我們正在做一個等聯)
cust_store_join = JOIN cust_data BY uniqueId, store_data BY uniqueId;
然後生成你的列
cust_store = FOREACH cust_store_join GENERATE cust_data::uniqueId as uniqueId, cust_data::customerId as customerId, cust_data::name as cust_name, store_data::storeNum as storeNum, store_data::name as store_name;
現在就加入使用UNIQUEID客戶存儲和產品(我們正在做等值連接)
cust_store_product_join = JOIN cust_store BY uniqueId, product_data BY uniqueId;
最後生成所有所需的列
customer_store_product = FOREACH cust_store_product_join GENERATE cust_store::uniqueId as uniqueId, cust_store::customerId as customerId, cust_store::cust_name as cust_name, cust_store::storeNum as storeNum, product_data::sku as sku, product_data::productName as productName;
現在存儲在本地所需的列/ hdfs目錄 下面的存儲命令將存儲來自所有三個表的所有匹配uniqueId,即客戶,商店,產品
STORE customer_store_product INTO '\your\output\path' USING PigStorage(',');
同樣,您可以加入您的list1架構並使用相同的邏輯生成列和存儲數據。 希望這會有所幫助