2014-09-22 81 views
-2

嗨,大家好,我有問題如何在字典中總結相同的IP地址。 我輸入文件,該文件是這樣的:Python字典sum

IP   , Byte 
10.180.176.61,3669 
10.164.134.193,882 
10.164.132.209,4168 
10.120.81.141,4297 
10.180.176.61,100 

我此舉是爲了打開該文件,並用逗號後的數字解析IP地址,以便我可以總結的所有字節的一個IP地址。這樣我就可以像結果:

IP 10.180.176.61 , 37669 

我的代碼如下所示:

#!/usr/bin/python 
# -*- coding: utf-8 -*- 

import re,sys, os 
from collections import defaultdict 

f  = open('splited/small_file_1000000.csv','r') 
o  = open('gotovo1.csv','w') 

list_of_dictionaries = {} 

for line in f: 
    if re.search(r'\d{1,3}.\d{1,3}.\d{1,3}.\d{1,3}.*',line): 
     line_ip = re.findall(r'\d{1,3}.\d{1,3}.\d{1,3}.\d{1,3}',line)[0] 
     line_by = re.findall(r'\,\d+',line)[0] 
     line_b = re.sub(r'\,','',line_by) 

     list_of_dictionaries['IP'] = line_ip 
     list_of_dictionaries['VAL'] = int(line_b) 


c = defaultdict(int) 
for d in list_of_dictionaries: 
    c[d['IP']] += d['VAL'] 

print c 

任何想法將是巨大的。

回答

1

使用csv模塊讀取文件並collections.Counter總結每個IP地址的總數:

from collections import Counter 
import csv 


def read_csv(fn): 
    with open(fn, 'r') as csvfile: 
     reader = csv.reader(csvfile, delimiter=',') 
     reader.next() # Skip header 
     for row in reader: 
      ip, bytes = row 
      yield ip, int(bytes) 


totals = Counter() 
for ip, bytes in read_csv('data.txt'): 
    totals[ip] += bytes 

print totals 

輸出:

Counter({'10.120.81.141': 4297, '10.164.132.209': 4168, '10.180.176.61': 3769, '10.164.134.193': 882}) 
0

如果你的文件看起來像這個例子中你提供你不不需要正則表達式來解析它。

list_of_dictionaries = {} 
with open('splited/small_file_1000000.csv', 'r') as f: 
    header = f.readline() 
    for line in f: 
      ip, bytes = line.split(',') 
      if list_of_dictionaries.has_key(ip): 
       list_of_dictionaries[ip] += int(bytes.strip()) 
      else: 
       list_of_dictionaries[ip] = int(bytes.strip()) 
OUT: {'10.180.176.61': 3769, '10.164.134.193': 882, '10.164.132.209': 4168, '10.120.81.141': 4297} 
:只要使用逗號分割線