2012-12-24 31 views
2

我使用的是現場地圖生成器功能從http://www.dabeaz.com/generators/fieldmap.py巨蟒發電機無效字面INT()

#!/usr/bin/env python 

def field_map(dictseq, name, func): 
    for d in dictseq: 
     d[name] = func(d[name]) 
     yield d 

if __name__ == '__main__': 
    loglines = open("test.log") 
    import re 
    logpats = r'(\S+) (\S+) (\S+) (\S+) (\S+) \[(.*?)\] \"(.*?)\" (\S+) (\S+) \"(.*?)\" \"(.*?)\" (\S+) \"(.*?)\" \"(.*?)\" (\S+)' 
    logpat = re.compile(logpats) 
    groups = (logpat.match(line) for line in loglines) 
    tuples = (g.groups() for g in groups if g) 
    #for t in tuples: 
    # print t 

    colnames = ('record_id', 'elapsed_time', 'client', 'username' , 'client_id','date', 
       'http_method_url', 'status', 'size', 'http_referer','useragent', 'mime', 
       'filter_name_reason', 'profiles', 'ipport') 
    log = (dict(zip(colnames,t)) for t in tuples) 
    log = field_map(log,"status",int) 
    log = field_map(log,"size",lambda s: int(s) if s != '-' else 0) 
    for x in log: 
     print x 

它給出了這樣的錯誤,任何想法?

[[email protected] extended]# python fieldmap.py 
Traceback (most recent call last): 
    File "fieldmap.py", line 24, in <module> 
    for x in log: 
    File "fieldmap.py", line 4, in field_map 
    for d in dictseq: 
    File "fieldmap.py", line 5, in field_map 
    d[name] = func(d[name]) 
ValueError: invalid literal for int() with base 10: 'status' 

test.log中有數據以這種格式

"1356313509.519-6-10.66.54.21-8080" 2089 10.112.151.213 "[email protected]" "6" [24/Dec/2012:01:45:11] "GET http://apps.facebook.com:80/thesimssocial/?fb_source=bookmark_apps&ref=bookmarks&count=2&fb_bmpos=4_2" 200 58300 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.97 Safari/537.11 BMID/E679E9E153" text/html "- -" "M&B-112,HTTP,QUERIES,uncachable,antivirus,REDIRECT_THIS" "10.66.54.21:8080" 
+0

'test.log'中有什麼? –

+1

int()是一個內建函數,它只包含包含字符串的整數。爲什麼你將'int'傳遞給函數'field_map'? –

+0

@AshwiniChaudhary傳遞'int'在這裏很有意義,'field_map'是一種奇怪的'map()',帶有副作用。 –

回答

1

在test.log中的第一行可能包含字段名稱,而不是它們的值的報頭。這就是爲什麼你看到「狀態」而不是例如「200」。

你可以讓你的正則表達式更有選擇性地過濾出不適當的行,例如使用\d+來匹配http狀態。

+0

你是對的;) – krisdigitx