如何正確拆分下面的字符串？ - Python的

我有以下文件解析：如何正確拆分下面的字符串？ - Python的

Total Virtual Clients  :    10  (1 Machines) 
Current Connections   :    10 
Total Elapsed Time   :    50 Secs (0 Hrs,0 Mins,50 Secs) 

Total Requests    :   337827  ( 6687/Sec) 
Total Responses    :   337830  ( 6687/Sec) 
Total Bytes     :  990388848  ( 20571 KB/Sec) 
Total Success Connections :   3346  (  66/Sec) 
Total Connect Errors  :    0  (  0/Sec) 
Total Socket Errors   :    0  (  0/Sec) 
Total I/O Errors   :    0  (  0/Sec) 
Total 200 OK    :   33864  ( 718/Sec) 
Total 30X Redirect   :    0  (  0/Sec) 
Total 304 Not Modified  :    0  (  0/Sec) 
Total 404 Not Found   :   303966  ( 5969/Sec) 
Total 500 Server Error  :    0  (  0/Sec) 
Total Bad Status   :   303966  ( 5969/Sec)

，所以我必須解析算法來搜索文件的值，但是，當我這樣做：

for data in temp: 
    line = data.strip().split() 
    print line

其中temp是我的臨時緩衝區，它包含這些值，我得到：

['Total', 'I/O', 'Errors', ':', '0', '(', '0/Sec)'] 
['Total', '200', 'OK', ':', '69807', '(', '864/Sec)'] 
['Total', '30X', 'Redirect', ':', '0', '(', '0/Sec)'] 
['Total', '304', 'Not', 'Modified', ':', '0', '(', '0/Sec)'] 
['Total', '404', 'Not', 'Found', ':', '420953', '(', '5289/Sec)'] 
['Total', '500', 'Server', 'Error', ':', '0', '(', '0/Sec)']

，我想：

['Total I/O Errors', '0', '0'] 
['Total 200 OK', '69807', '864'] 
['Total 30X Redirect', '0', '0']

等等。我怎麼能做到這一點？

來源

2013-02-21 cybertextron

你可以使用一個regular expression如下：

import re 
rex = re.compile('([^:]+\S)\s*:\s*(\d+)\s*\(\s*(\d+)/Sec\)') 
for line in temp: 
    match = rex.match(line) 
    if match: 
     print match.groups()

，這將給你：

['Total Requests', '337827', '6687'] 
['Total Responses', '337830', '6687'] 
['Total Success Connections', '3346', '66'] 
['Total Connect Errors', '0', '0'] 
['Total Socket Errors', '0', '0'] 
['Total I/O Errors', '0', '0'] 
['Total 200 OK', '33864', '718'] 
['Total 30X Redirect', '0', '0'] 
['Total 304 Not Modified', '0', '0'] 
['Total 404 Not Found', '303966', '5969'] 
['Total 500 Server Error', '0', '0'] 
['Total Bad Status', '303966', '5969']

注意，將只匹配對應行「TITLE：NUMBER（NUMBER /秒）」。您也可以調整表達式以匹配其他行。

來源

2013-02-21 22:58:09 isedev

這是個酷男！謝謝！（將在幾個接受你的答案） – cybertextron 2013-02-21 22:59:40

相反的空白分裂，則需要根據您的格式其他分隔符分裂，它可能是這個樣子：

for data in temp: 
    first, rest = data.split(':') 
    second, rest = rest.split('(') 
    third, rest = rest.split(')') 
    print [x.strip() for x in (first, second, third)]

來源

2013-02-21 23:01:01

數據似乎是固定的字段寬度，所以它會更好地使用固定切片 – 2013-02-21 23:12:33

正則表達式是矯枉過正解析您的數據，但它是一種表達固定長度字段的便捷方式。例如

for data in temp: 
    first, second, third = re.match("(.{28}):(.{21})(.*)", data).groups() 
    ...

這意味着第一個字段是28個字符。跳過'：'，接下來21個字符是第二個字段，其餘是第3個字段

來源

2013-02-21 23:09:46

如何正確拆分下面的字符串？ - Python的

回答

相關問題