我有2個數據源。一個包含api調用列表,另一個包含所有相關的認證事件。每個Api調用可以有多個Auth事件,我想查找以下驗證事件:
a)包含與Api調用相同的「標識符」
b)Api調用後一秒內發生
c)在上述過濾之後最接近Api調用。Pig Latin(在foreach循環中過濾第2個數據源)
我曾在一個foreach循環通過每個ApiCall事件計劃循環再利用的authevents過濾語句來找到正確的 - 但是,它不會出現,這是可能的(USING Filter in a Nested FOREACH in PIG)
會有人能夠建議其他方式來實現這一點。如果有幫助,這裏的豬腳本我試着使用:
apiRequests = LOAD '/Documents/ApiRequests.txt' AS (api_fileName:chararray, api_requestTime:long, api_timeFromLog:chararray, api_call:chararray, api_leadString:chararray, api_xmlPayload:chararray, api_sourceIp:chararray, api_username:chararray, api_identifier:chararray);
authEvents = LOAD '/Documents/AuthEvents.txt' AS (auth_fileName:chararray, auth_requestTime:long, auth_timeFromLog:chararray, auth_call:chararray, auth_leadString:chararray, auth_xmlPayload:chararray, auth_sourceIp:chararray, auth_username:chararray, auth_identifier:chararray);
specificApiCall = FILTER apiRequests BY api_call == 'CSGetUser'; -- Get all events for this specific call
match = foreach specificApiCall { -- Now try to get the closest mathcing auth event
filtered1 = filter authEvents by auth_identifier == api_identifier; -- Only use auth events that have the same identifier (this will return several)
filtered2 = filter filtered1 by (auth_requestTime-api_requestTime)<1000; -- Further refine by usings auth events within a second on the api call's tiime
sorted = order filtered2 by auth_requestTime; -- Get the auth event that's closest to the api call
limited = limit sorted 1;
generate limited;
};
dump match;
謝謝小熊,我用協同組和它的工作一種享受。你是最好的! – Hinchy