0
有人可以幫我解決以下問題。我正在嘗試分析安全日誌以發現虛假警報。錯誤警報是包含「未創建TXT」的錯誤警報,並且「txt未創建」時爲true。如何從數據源中提取特定的「未創建的txt」(下面給出的示例輸入數據)。使用python mapreduce識別虛假警報
from mrjob.job import MRJob
class MRWordFrequencyCount(MRJob):
def mapper(self, _, line):
words = line.split()
for word in words:
word = unicode(word, "utf-8", errors="ignore")
yield word, 1
def reducer(self, key, values):
yield key, sum(values)
if __name__ == '__main__':
MRWordFrequencyCount.run()
樣本輸入在這裏給出:
Mon Feb 1 12:13:59 EST 2016 virtual user etransactiondev started to upload file
/export/home/pub/etransactiondev/uploads/etransactionenvironment/ABC/rrd/in/WCWT.SMR.XYZ0002.PLSE.INPUT01.LFEP_APOL_D_M_20160201171358.TXT
/export/home/pub/etransactiondev/uploads/etransactionenvironment/ABC/rrd/in/WCWT.SMR.XYZ0002.PLSE.INPUT01.LFEP_APOL_D_M_20160201171358.txt was not created
>「TXT未創建」,並且「txt未創建」時爲true。 有沒有錯誤或差異真的只是'TXT'和'TXT'這兩個字的情況? – DAXaholic