2013-08-26 64 views
-3

我已經像下面蟒蛇RE和列表處理

ElapsedTime2.68s: PlaceOrder 
ElapsedTime2.69s: ClassARestCAll 
ElapsedTime0.11s: GetOrderList 
ElapsedTime0.11s: ClassARestCAll 
ElapsedTime2.10s: PlaceOrder 
ElapsedTime2.11s: ClassARestCAll 
ElapsedTime0.10s: GetOrderList 
ElapsedTime0.10s: ClassARestCAll 
ElapsedTime2.00s: PlaceOrder 
ElapsedTime2.01s: ClassARestCAll 
ElapsedTime0.28s: GetOrderList 
ElapsedTime0.28s: ClassARestCAll 
ElapsedTime1.64s: PlaceOrder 
ElapsedTime1.65s: ClassARestCAll 
ElapsedTime0.11s: GetOrderList 
ElapsedTime0.11s: ClassARestCAll 
ElapsedTime1.99s: PlaceOrder 
ElapsedTime2.01s: ClassARestCAll 

我怎麼能分析該文件一個文件來得到這樣的結果?

   average min max 
ClassARestCAll 1.23 0.1 2.69 
GetOrderList  0.15 0.1 0.28 
PlaceOrder  2.082 1.64 2.68 

我開發了一種方法來解決它使用RE和列表操作。 但是,我的方法掃描整個列表一次,每個新的方法名稱。

我們如何通過只掃描一次列表來獲得所有API名稱的統計信息。

import re 

def get_stats(N, p_api): 
    list_of_rt = [] 
    for line in N: 
     y= re.split("\s+", line) 
     if y[1] == p_api: 
      curr_rt = float(y[0][11:-2]) 
      list_of_rt.append(curr_rt) 

    min_rt ,max_rt = min(list_of_rt), max(list_of_rt) 
    total_rt, total_cnt = sum(list_of_rt), len(list_of_rt) 
    print p_api, min_rt, max_rt, "%.3f" %round(total_rt/total_cnt,3), total_cnt 


ifile = open('data1.txt','r').read() 
api_rts= re.findall(r'ElapsedTime\d*.\d*s: \S*',ifile) 


list_of_api_names = [] 
for api_rt in api_rts: 
    y= re.split("\s+", api_rt) 
    list_of_api_names.append(y[1]) 

#get distinct list of API names 

distinct_apis = set(list_of_api_names) 

print 'api   min, max, average, total occurences' 

# for each API name call get_stat 

for api in distinct_apis: 
    get_stats(api_rts ,api) 
+1

您是否嘗試過做任何事:3? – TerryA

+1

是的,我做到了。這裏是我的程序的輸出:api min,max,average,total Occurences GetOrderList 0.1 0.28 0.150 4 PlaceOrder 1.64 2.68 2.082 5 ClassARestCAll 0.1 2.69 1.230 9 – user2716941

+0

您的''re.findall(r'ElapsedTime \ d *。 ('data1.txt','r')。readlines()''(或''readlines(()') ).strip()''如果每行的開始和結尾都有空格)。 – eyquem

回答

1
import re 

rgx = re.compile('ElapsedTime(\d*\.\d*)s: (\S*)') 

from collections import defaultdict 
d = defaultdict(list) 

with open('data1.txt','r') as f: 
    for m in rgx.finditer(f.read()): 
     d[m.group(2)].append(float(m.group(1))) 

lapi = max(map(len,d.iterkeys())) 

print '{: ^{width}} min max average total occurences'.format('api',width=lapi) 
pat = '{0:%d} {1:.2f} {2:.2f} {3:.3f}  {4}' % lapi 
print '\n'.join(pat.format(api,min(li),max(li),sum(li)/len(li),len(li)) 
       for api,li in d.iteritems()) 
0
import re, numpy 
from collections import defaultdict 

data = defaultdict(list) 
with open('data1.txt') as f: 
    for line in f: 
     l = re.findall('([\d.]+|\w+$)', line) 
     data[l[1]].append(float(l[0])) 

metrics = [ ['avg', numpy.average], ['min',min], ['max',max] ] 
summary = defaultdict(dict) 
for k,l in data.items(): 
    summary[k] = { m[0] : m[1](l) for m in metrics } 

print " " * 17 + "\t".join("%-5s" % m[0] for m in metrics) 
for k, s in summary.items(): 
    print "%-15s: " % k + "\t".join("%.3f" % s[m[0]] for m in metrics)