使用python mapreduce識別虛假警報

有人可以幫我解決以下問題。我正在嘗試分析安全日誌以發現虛假警報。錯誤警報是包含「未創建TXT」的錯誤警報，並且「txt未創建」時爲true。如何從數據源中提取特定的「未創建的txt」（下面給出的示例輸入數據）。使用python mapreduce識別虛假警報

from mrjob.job import MRJob 

class MRWordFrequencyCount(MRJob): 

def mapper(self, _, line): 
    words = line.split() 
    for word in words: 
     word = unicode(word, "utf-8", errors="ignore") 
     yield word, 1 

def reducer(self, key, values): 
    yield key, sum(values) 

if __name__ == '__main__': 
    MRWordFrequencyCount.run()

樣本輸入在這裏給出：

Mon Feb 1 12:13:59 EST 2016 virtual user etransactiondev started to upload file 
/export/home/pub/etransactiondev/uploads/etransactionenvironment/ABC/rrd/in/WCWT.SMR.XYZ0002.PLSE.INPUT01.LFEP_APOL_D_M_20160201171358.TXT 
/export/home/pub/etransactiondev/uploads/etransactionenvironment/ABC/rrd/in/WCWT.SMR.XYZ0002.PLSE.INPUT01.LFEP_APOL_D_M_20160201171358.txt was not created

來源

2016-04-24 Shiv

>「TXT未創建」，並且「txt未創建」時爲true。有沒有錯誤或差異真的只是'TXT'和'TXT'這兩個字的情況？ – DAXaholic

你能只檢查的第一個字？

word = word.split(' ') 
if word[0] == 'TXT': 
    do something...

來源

2016-04-28 06:20:38 kermitvomit

感謝嘔吐物的答案。截至目前，我正試圖從輸入文件中提取用戶名。你可以幫助我像輸入行中提取用戶名：Mon Feb 1 12:13:59 EST 2016虛擬用戶etransactiondev開始上傳文件。我需要提取etransactiondev – Shiv

使用python mapreduce識別虛假警報

回答

相關問題