2013-02-21 40 views
1

我有以下文件解析:如何正確拆分下面的字符串? - Python的

Total Virtual Clients  :    10  (1 Machines) 
Current Connections   :    10 
Total Elapsed Time   :    50 Secs (0 Hrs,0 Mins,50 Secs) 

Total Requests    :   337827  ( 6687/Sec) 
Total Responses    :   337830  ( 6687/Sec) 
Total Bytes     :  990388848  ( 20571 KB/Sec) 
Total Success Connections :   3346  (  66/Sec) 
Total Connect Errors  :    0  (  0/Sec) 
Total Socket Errors   :    0  (  0/Sec) 
Total I/O Errors   :    0  (  0/Sec) 
Total 200 OK    :   33864  ( 718/Sec) 
Total 30X Redirect   :    0  (  0/Sec) 
Total 304 Not Modified  :    0  (  0/Sec) 
Total 404 Not Found   :   303966  ( 5969/Sec) 
Total 500 Server Error  :    0  (  0/Sec) 
Total Bad Status   :   303966  ( 5969/Sec) 

,所以我必須解析算法來搜索文件的值,但是,當我這樣做:

for data in temp: 
    line = data.strip().split() 
    print line 

其中temp是我的臨時緩衝區,它包含這些值, 我得到:

['Total', 'I/O', 'Errors', ':', '0', '(', '0/Sec)'] 
['Total', '200', 'OK', ':', '69807', '(', '864/Sec)'] 
['Total', '30X', 'Redirect', ':', '0', '(', '0/Sec)'] 
['Total', '304', 'Not', 'Modified', ':', '0', '(', '0/Sec)'] 
['Total', '404', 'Not', 'Found', ':', '420953', '(', '5289/Sec)'] 
['Total', '500', 'Server', 'Error', ':', '0', '(', '0/Sec)'] 

,我想:

['Total I/O Errors', '0', '0'] 
['Total 200 OK', '69807', '864'] 
['Total 30X Redirect', '0', '0'] 

等等。 我怎麼能做到這一點?

回答

4

你可以使用一個regular expression如下:

import re 
rex = re.compile('([^:]+\S)\s*:\s*(\d+)\s*\(\s*(\d+)/Sec\)') 
for line in temp: 
    match = rex.match(line) 
    if match: 
     print match.groups() 

,這將給你:

['Total Requests', '337827', '6687'] 
['Total Responses', '337830', '6687'] 
['Total Success Connections', '3346', '66'] 
['Total Connect Errors', '0', '0'] 
['Total Socket Errors', '0', '0'] 
['Total I/O Errors', '0', '0'] 
['Total 200 OK', '33864', '718'] 
['Total 30X Redirect', '0', '0'] 
['Total 304 Not Modified', '0', '0'] 
['Total 404 Not Found', '303966', '5969'] 
['Total 500 Server Error', '0', '0'] 
['Total Bad Status', '303966', '5969'] 

注意,將只匹配對應行 「TITLE:NUMBER(NUMBER /秒)」 。您也可以調整表達式以匹配其他行。

+0

這是個酷男!謝謝! (將在幾個接受你的答案) – cybertextron 2013-02-21 22:59:40

0

相反的空白分裂,則需要根據您的格式其他分隔符分裂,它可能是這個樣子:

for data in temp: 
    first, rest = data.split(':') 
    second, rest = rest.split('(') 
    third, rest = rest.split(')') 
    print [x.strip() for x in (first, second, third)] 
+0

數據似乎是固定的字段寬度,所以它會更好地使用固定切片 – 2013-02-21 23:12:33

1

正則表達式是矯枉過正解析您的數據,但它是一種表達固定長度字段的便捷方式。例如

for data in temp: 
    first, second, third = re.match("(.{28}):(.{21})(.*)", data).groups() 
    ... 

這意味着第一個字段是28個字符。跳過':',接下來21個字符是第二個字段,其餘是第3個字段