2012-10-24 80 views
6

這是我的第一個問題在這裏問stackoverflow,我真的很期待成爲這個社區的一部分。我是程序新手,python是許多人推薦的第一個程序。Python解析日誌文件的IP地址和協議

反正。我有一個類似如下的日誌文件:

"No.","Time","Source","Destination","Protocol","Info" 
"1","0.000000","120.107.103.180","172.16.112.50","TELNET","Telnet Data ..." 
"2","0.000426","172.16.112.50","172.16.113.168","TELNET","Telnet Data ..." 
"3","0.019849","172.16.113.168","172.16.112.50","TCP","21582 > telnet [ACK]" 
"4","0.530125","172.16.113.168","172.16.112.50","TELNET","Telnet Data ..." 
"5","0.530634","172.16.112.50","172.16.113.168","TELNET","Telnet Data ..." 

而我想用Python,使它看起來像這樣的結果來分析日誌文件:

從IP 135.13.216.191協議字數: (IMF 1) (SMTP 38) (TCP 24)(總計:63)

我真的想一些HEL爲了解決這個問題,我應該採用什麼方法來使用列表並循環遍歷它或字典/元組?

在此先感謝您的幫助!

+1

'135.13.216.191'從哪裏來的? – Eric

+0

這只是一個例子,但它來自目標字段中的一行。 –

回答

0

首先你要在文本文件中讀取

# Open the file 
file = open('log_file.csv') 
# readlines() will return the data as a list of strings, one for each line 
log_data = file.readlines() 
# close the log file 
file.close() 

將字典設置爲h舊的結果

results = {} 

現在遍歷數據,每次一行,並記錄協議在字典

for entry in log_data: 
    entry_data = entry.split(',') 
    # We are going to have a separate entry for each source ip 
    # If we haven't already seen this ip, we need to make an entry for it 
    if entry_data[2] not in results: 
     results[entry_data[2]] = {'total':0} 
    # Now check to see if we've seen the protocol for this ip before 
    # If we haven't, add a new entry set to 0 
    if entry_data[4] not in results[entry_data[2]]: 
     results[entry_data[2]][entry_data[4]] = 0 
    # Now we increment the count for this protocol 
    results[entry_data[2]][entry_data[4]] += 1 
    # And we increment the total count 
    results[entry_data[2]]['total'] += 1 

一旦你計算的一切,只是遍歷你的計數和打印出結果

for ip in results: 
    # Here we're printing a string with placeholders. the {0}, {1} and {2} will be filled 
    # in by the call to format 
    print "from: IP {0} Protocol Count: {1})".format(
     ip, 
     # And finally create the value for the protocol counts with another format call 
     # The square braces with the for statement inside create a list with one entry 
     # for each entry, in this case, one entry for each protocol 
     # We use ' '.join to join each of the counts with a string 
     ' '.join(["({0}: {1})".format(protocol, results[ip][protocol] for protocol in results[ip])])) 
+0

謝謝Skunkwaffle非常喜歡簡單的代碼。 –

+1

@JohnSmith:你不會使用內置的['csv'模塊](http://docs.python.org/library/csv.html)來解析你的csv文件 – Eric

+0

@DSM我想你對了,考慮我的答案沒有投票,埃裏克有7個,他應該可能是答案。如果OP將把接受的答案切換到Eric的,我可以刪除我的。 – Skunkwaffle

9

您可以通過分析該文件的csv module

import csv 

with open('logfile.txt') as logfile: 
    for row in csv.reader(logfile): 
     no, time, source, dest, protocol, info = row 
     # do stuff with these 

我不能完全告訴你問什麼,但我想你想:

import csv 
from collections import defaultdict 

# A dictionary whose values are by default (a 
# dictionary whose values are by default 0) 
bySource = defaultdict(lambda: defaultdict(lambda: 0)) 

with open('logfile.txt') as logfile: 
    for row in csv.DictReader(logfile): 
     bySource[row["Source"]][row["Protocol"]] += 1 

for source, protocols in bySource.iteritems(): 
    protocols['Total'] = sum(protocols.values()) 

    print "From IP %s Protocol Count: %s" % (
     source, 
     ' '.join("(%s: %d)" % item for item in protocols.iteritems()) 
    ) 
+0

感謝您的回覆。那麼我會簡單地循環查看每個IP地址的結果並計算每個協議出現的次數?謝謝 –

+0

@JohnSmith:我的代碼已經做到了 - 你錯過了我的更新嗎? – Eric

+0

謝謝,剛剛看到更新。非常感謝埃裏克我真的很感謝幫助。即使正在使用的文件大小爲1-2 GB,這也可以嗎? –

1

我將開始通過第一將文件讀入列表中:

contents = [] 
with open("file_path") as f: 
    contents = f.readlines() 

然後,您可以將每行分割成它自己的列表:

ips = [l[1:-1].split('","') for l in contents] 

然後,我們可以映射到這些的字典:

sourceIps = {} 
for ip in ips: 
    try: 
     sourceIps[ip[2]].append(ip) 
    except: 
     sourceIps[ip[2]] = [ip] 

最後打印出結果:

for ip, stuff in sourceIps.iteritems(): 
    print "From {0} ... ".format(ip, ...) 
+0

PS:這主要是作爲一些可用於python的編程結構的示例,而不是使用python的「最佳」解決方案,因爲OP似乎在尋找介紹性的編程信息。 – Will

+0

非常感謝。這有助於設置列表,但我如何計算每個IP地址的唯一協議數量?同樣,每個IP地址使用不同的協議重複多次。 –

+0

已更新能在地圖中添加譯文 – Will