有什麼辦法讓這個功能看起來更好嗎？

我需要一個邏輯，將提取Apache日誌文件的URL：現在我這樣做：有什麼辦法讓這個功能看起來更好嗎？

apache_log = {'@source': 'file://xxxxxxxxxxxxxxx//var/log/apache2/access.log', '@source_host': 'xxxxxxxxxxxxxxxxxxx', '@message': 'xxxxxxxxxxxxxxx xxxxxxxxxx - - [02/Aug/2013:12:38:37 +0000] "POST /user/12345/product/2 HTTP/1.1" 404 513 "-" "PycURL/7.26.0"', '@tags': [], '@fields': {}, '@timestamp': '2013-08-02T12:38:38.181000Z', '@source_path': '//var/log/apache2/access.log', '@type': 'Apache-access'} 
data = apache_log['@message'].split() 
if data.index('"POST') and data[data.index('"POST')+2].startswith('HTTP'): 
    print data[data.index('"POST')+1]

它返回我：

/user/12345/product/2

基本上的結果是正確的，但方法我做了我不太喜歡。

有人可以建議更好（更Pythonic）的方式從apache日誌文件中提取此路徑。

來源

2013-08-02 Vor

屬於上codereview.SE。 – geoffspear

use phython regexp –

我不認爲'if data.index（''POST'）'部分是按照你想要的方式工作的，爲了將來的參考，檢查列表中的東西是否只是數據中的''POST' '。 – user2357112

正則表達式會更好地工作：

import re 

post_path = re.compile(r'"POST (/\S+) HTTP') 

match = post_path.search(apache_log['@message']) 
if match: 
    print match.group(1)

演示：

>>> import re 
>>> apache_log = {'@source': 'file://xxxxxxxxxxxxxxx//var/log/apache2/access.log', '@source_host': 'xxxxxxxxxxxxxxxxxxx', '@message': 'xxxxxxxxxxxxxxx xxxxxxxxxx - - [02/Aug/2013:12:38:37 +0000] "POST /user/12345/product/2 HTTP/1.1" 404 513 "-" "PycURL/7.26.0"', '@tags': [], '@fields': {}, '@timestamp': '2013-08-02T12:38:38.181000Z', '@source_path': '//var/log/apache2/access.log', '@type': 'Apache-access'} 
>>> post_path = re.compile(r'"POST (/\S+) HTTP') 
>>> match = post_path.search(apache_log['@message']) 
>>> if match: 
...  print match.group(1) 
... 
/user/12345/product/2

來源

2013-08-02 16:28:12

謝謝@Martijn你的答案總是很棒！ – Vor

有什麼辦法讓這個功能看起來更好嗎？

回答

相關問題