Python解析日誌文件的IP地址和協議

這是我的第一個問題在這裏問stackoverflow，我真的很期待成爲這個社區的一部分。我是程序新手，python是許多人推薦的第一個程序。Python解析日誌文件的IP地址和協議

反正。我有一個類似如下的日誌文件：

"No.","Time","Source","Destination","Protocol","Info" 
"1","0.000000","120.107.103.180","172.16.112.50","TELNET","Telnet Data ..." 
"2","0.000426","172.16.112.50","172.16.113.168","TELNET","Telnet Data ..." 
"3","0.019849","172.16.113.168","172.16.112.50","TCP","21582 > telnet [ACK]" 
"4","0.530125","172.16.113.168","172.16.112.50","TELNET","Telnet Data ..." 
"5","0.530634","172.16.112.50","172.16.113.168","TELNET","Telnet Data ..."

而我想用Python，使它看起來像這樣的結果來分析日誌文件：

從IP 135.13.216.191協議字數：（IMF 1）（SMTP 38）（TCP 24）（總計：63）

我真的想一些HEL爲了解決這個問題，我應該採用什麼方法來使用列表並循環遍歷它或字典/元組？

在此先感謝您的幫助！

來源

2012-10-24 John Smith

'135.13.216.191'從哪裏來的？ – Eric

這只是一個例子，但它來自目標字段中的一行。 –

首先你要在文本文件中讀取

# Open the file 
file = open('log_file.csv') 
# readlines() will return the data as a list of strings, one for each line 
log_data = file.readlines() 
# close the log file 
file.close()

將字典設置爲h舊的結果

results = {}

現在遍歷數據，每次一行，並記錄協議在字典

for entry in log_data: 
    entry_data = entry.split(',') 
    # We are going to have a separate entry for each source ip 
    # If we haven't already seen this ip, we need to make an entry for it 
    if entry_data[2] not in results: 
     results[entry_data[2]] = {'total':0} 
    # Now check to see if we've seen the protocol for this ip before 
    # If we haven't, add a new entry set to 0 
    if entry_data[4] not in results[entry_data[2]]: 
     results[entry_data[2]][entry_data[4]] = 0 
    # Now we increment the count for this protocol 
    results[entry_data[2]][entry_data[4]] += 1 
    # And we increment the total count 
    results[entry_data[2]]['total'] += 1

一旦你計算的一切，只是遍歷你的計數和打印出結果

for ip in results: 
    # Here we're printing a string with placeholders. the {0}, {1} and {2} will be filled 
    # in by the call to format 
    print "from: IP {0} Protocol Count: {1})".format(
     ip, 
     # And finally create the value for the protocol counts with another format call 
     # The square braces with the for statement inside create a list with one entry 
     # for each entry, in this case, one entry for each protocol 
     # We use ' '.join to join each of the counts with a string 
     ' '.join(["({0}: {1})".format(protocol, results[ip][protocol] for protocol in results[ip])]))

來源

2012-10-24 20:33:01 Skunkwaffle

謝謝Skunkwaffle非常喜歡簡單的代碼。 –

@JohnSmith：你不會使用內置的['csv'模塊]（http://docs.python.org/library/csv.html）來解析你的csv文件 – Eric

@DSM我想你對了，考慮我的答案沒有投票，埃裏克有7個，他應該可能是答案。如果OP將把接受的答案切換到Eric的，我可以刪除我的。 – Skunkwaffle

您可以通過分析該文件的csv module：

import csv 

with open('logfile.txt') as logfile: 
    for row in csv.reader(logfile): 
     no, time, source, dest, protocol, info = row 
     # do stuff with these

我不能完全告訴你問什麼，但我想你想：

import csv 
from collections import defaultdict 

# A dictionary whose values are by default (a 
# dictionary whose values are by default 0) 
bySource = defaultdict(lambda: defaultdict(lambda: 0)) 

with open('logfile.txt') as logfile: 
    for row in csv.DictReader(logfile): 
     bySource[row["Source"]][row["Protocol"]] += 1 

for source, protocols in bySource.iteritems(): 
    protocols['Total'] = sum(protocols.values()) 

    print "From IP %s Protocol Count: %s" % (
     source, 
     ' '.join("(%s: %d)" % item for item in protocols.iteritems()) 
    )

來源

2012-10-24 20:12:46 Eric

感謝您的回覆。那麼我會簡單地循環查看每個IP地址的結果並計算每個協議出現的次數？謝謝 –

@JohnSmith：我的代碼已經做到了 - 你錯過了我的更新嗎？ – Eric

謝謝，剛剛看到更新。非常感謝埃裏克我真的很感謝幫助。即使正在使用的文件大小爲1-2 GB，這也可以嗎？ –

我將開始通過第一將文件讀入列表中：

contents = [] 
with open("file_path") as f: 
    contents = f.readlines()

然後，您可以將每行分割成它自己的列表：

ips = [l[1:-1].split('","') for l in contents]

然後，我們可以映射到這些的字典：

sourceIps = {} 
for ip in ips: 
    try: 
     sourceIps[ip[2]].append(ip) 
    except: 
     sourceIps[ip[2]] = [ip]

最後打印出結果：

for ip, stuff in sourceIps.iteritems(): 
    print "From {0} ... ".format(ip, ...)

來源

2012-10-24 20:13:44 Will

PS：這主要是作爲一些可用於python的編程結構的示例，而不是使用python的「最佳」解決方案，因爲OP似乎在尋找介紹性的編程信息。 – Will

非常感謝。這有助於設置列表，但我如何計算每個IP地址的唯一協議數量？同樣，每個IP地址使用不同的協議重複多次。 –

已更新能在地圖中添加譯文 – Will

Python解析日誌文件的IP地址和協議

回答

相關問題