2015-10-17 45 views
0

我遇到了使用STARTSWITH字符串函數的問題。我想,以顯示System_Period開頭的所有記錄有20040Hadoop Pig:使用STARTSWITH顯示條目

transactions = LOAD '/home/cloudera/datasets/assignment2/Transactions.csv' 
USING PigStorage(',') AS (Branch_Number:int, Contract_Number:int, 
Customer_Number:int,Invoice_Date:chararray, Invoice_Number:int, 
Product_Number:int, Sales_Amount:double, Employee_Number:int, 
Service_Date:chararray, System_Period:int); 

sysGroup = GROUP transactions BY System_Period; 

sysFilter = FILTER sysGroup BY STARTSWITH(transactions.System_Period, 20040); 

DUMP sysFilter; 

我收到的錯誤是

Could not infer the matching function for org.apache.pig.builtin.STARTSWITH as multiple or none of them fit. Please use an explicit cast. 

回答

0

STARTSWITH僅用於一個tuple1比較tuple2檢查tuple1是否包含tuple2。你不能傳遞一個關係或一個包。還有一點要注意的是它只接受String(chararray)而不是一個整數。在GROUP BY之前過濾以20040開頭的system_period,並將system_period加載爲chararray,然後根據需要將其轉換爲過濾器。 GROUP BYFLATTEN的結果,然後後

transactions = LOAD '/home/cloudera/datasets/assignment2/Transactions.csv' 
USING PigStorage(',') AS (Branch_Number:int, Contract_Number:int, 
Customer_Number:int,Invoice_Date:chararray, Invoice_Number:int, 
Product_Number:int, Sales_Amount:double, Employee_Number:int, 
Service_Date:chararray, System_Period:chararray); 
sysFilter = FILTER transactions BY STARTSWITH(System_Period, '20040'); 

否則過濾

transactions = LOAD '/home/cloudera/datasets/assignment2/Transactions.csv' 
USING PigStorage(',') AS (Branch_Number:int, Contract_Number:int, 
Customer_Number:int,Invoice_Date:chararray, Invoice_Number:int, 
Product_Number:int, Sales_Amount:double, Employee_Number:int, 
Service_Date:chararray, System_Period:chararray); 
sysGroup = GROUP transactions BY System_Period; 
flatres = FOREACH sysGroup GENERATE group,FLATTEN(transactions); 
sysFilter = FILTER flatres BY STARTSWITH(System_Period, '20040');