1
我想要計算一個子字符串的特定部分。我的A是正確的,但我在B正常工作時遇到麻煩。我包括實驗室的評論,以幫助解釋代碼的某些部分。生成一個子字符串(Apache Pig)的計數
data = LOAD '/dualcore/orders' AS (order_id:int,
cust_id:int,
order_dtm:chararray);
/*
* Include only records where the 'order_dtm' field matches
* the regular expression pattern:
*
* ^ = beginning of string
* 2013 = literal value '2013'
* 0[2345] = 0 followed by 2, 3, 4, or 5
* - = a literal character '-'
* \\d{2} = exactly two digits
* \\s = a single whitespace character
* .* = any number of any characters
* $ = end of string
*
* If you are not familiar with regular expressions and would
* like to know more about them, see the Regular Expression
* Reference at the end of the Exercise Manual.
*/
recent = FILTER data by order_dtm matches '^2013-0[2345]-\\d{2}\\s.*$';
-- TODO (A): Create a new relation with just the order's year and month
A = FOREACH data GENERATE SUBSTRING(order_dtm,0,7);
-- TODO (B): Count the number of orders in each month
B = FOREACH data GENERATE COUNT_STAR(A);
-- TODO (C): Display the count by month to the screen.
DUMP C;'