我想查看第4列中出現了多少次字符串。更具體地說,某些Netflow數據中出現了多少次端口號。有成千上萬的端口,所以我沒有尋找任何特定的遞歸。我已經使用冒號後面的數字解析了列,並且我希望代碼檢查該數字發生了多少次,因此最終輸出應該使用它發生的次數來打印數字。計算某個字符串在特定列中出現的次數
[OUTPUT ]
Port: 80 found: 3 times.
Port: 53 found: 2 times.
Port: 21 found: 1 times.
[CODE]
import re
frequency = {}
file = open('/Users/rojeliomaestas/Desktop/nettest2.txt', 'r')
with open('/Users/rojeliomaestas/Desktop/nettest2.txt', 'r') as infile:
next(infile)
for line in infile:
data = line.split()[4].split(":")[1]
text_string = file.read().lower()
match_pattern = re.findall(data, text_string)
for word in match_pattern:
count = frequency.get(word,0)
frequency[word] = count + 1
frequency_list = frequency.keys()
for words in frequency_list:
print ("port:", words,"found:", frequency[words], "times.")
[FILE]
Date first seen Duration Proto Src IP Addr:Port Dst IP Addr:Port Packets Bytes Flows
2017-04-02 12:07:32.079 9.298 UDP 8.8.8.8:80 -> 205.166.231.250:8080 1 345 1
2017-04-02 12:08:32.079 9.298 TCP 8.8.8.8:53 -> 205.166.231.250:80 1 75 1
2017-04-02 12:08:32.079 9.298 TCP 8.8.8.8:80 -> 205.166.231.250:69 1 875 1
2017-04-02 12:08:32.079 9.298 TCP 8.8.8.8:53 -> 205.166.231.250:443 1 275 1
2017-04-02 12:08:32.079 9.298 UDP 8.8.8.8:80 -> 205.166.231.250:23 1 842 1
2017-04-02 12:08:32.079 9.298 TCP 8.8.8.8:21 -> 205.166.231.250:25 1 146 1
OK。你的問題是什麼? –
順便說一句,你爲什麼用'file.read' *和*'作爲infile中的行?這似乎在吠叫。 –
另外最後的輸出循環應該是:'對於端口,在d.items()中計數:print(「port:」,port,「found:」,count,「times。」)' - 如果你使用'iteritems'都停留在Python 2.7上。 –