2016-04-05 101 views
0

我試圖讓許多值了相當複雜的字符串,它看起來像這樣的循環,通過價值觀 -蟒蛇 - 在一個字符串

s = '04/03 23:50:06:242[76:Health]: (mem=188094936/17146904576) Queue Size[=:+:-] : Core[Compiler:0:0:0,HighPriority:0:74:74,Default:6:1872:1874,LowPriority:0:2:2]:Special[Special:0:2:2]:Event[Event:0:0:0]:Comm[CommHigh:0:1134:1152,CommDefault:0:4:4]' 

這些都是我需要掃描的價值觀 -

list = ['Compiler', 'HighPriority', 'Default', 'LowPriority', 'Special', 'Event', 'CommHigh', 'CommDefault'] 

我的意圖是獲得每個字符串後面的3個數字,所以在HighPriority的例子中,我會得到[0, 74, 74],然後我可以對每個項目執行一些操作。

我已經使用了下面的內容,但是它並沒有說明字符串的結尾不是逗號。

def find_between(s, first, last): 
    try: 
     start = s.index(first) + len(first) 
     end = s.index(last, start) 
     return s[start:end] 
    except ValueError: 
     return "" 


for l in list: 
    print l 
    print find_between(s, l + ':', ',').split(':') 
+0

我認爲最好的方法來解決這個問題就是要學會用模塊的標準庫的「重」。 – mkiever

+0

是的,我的重複是可怕的。我嘗試過使用re,但是當我看到像'\ d \ w \ ++ \?\(\)'這樣的代碼塊時,我凍結了,因爲這對我來說並不容易: – whoisearth

+0

像'r = re .search('Compiler:([0-9] +):([0-9] +):([0-9] +)',s)'應該讓你開始使用'r.groups()'以獲得包含數字的三個子字符串 – mkiever

回答

2

編輯,如果你真的想避免正則表達式,你的方法適用於一個小調整(我改名listl以避免陰影內建的類型):

from itertools import takewhile 
from string import digits 

def find_between(s, first): 
    try: 
     start = s.index(first) + len(first) 
     # Keep taking the next character while it's either a ':' or a digit 
     # You can also just cast this into a list and forget about joining and later splitting. 
     # Also, consider storing ':'+digits in a variable to avoid recreating it all the time 
     return ''.join(takewhile(lambda char: char in ':'+digits, s[start:])) 
    except ValueError: 
     return "" 


for _ in l: 
    print _ 
    print find_between(s, _ + ':').split(':') 

此打印:

Compiler 
['0', '0', '0'] 
HighPriority 
['0', '74', '74'] 
Default 
['6', '1872', '1874'] 
LowPriority 
['0', '2', '2'] 
Special 
['0', '2', '2'] 
Event 
['0', '0', '0'] 
CommHigh 
['0', '1134', '1152'] 
CommDefault 
['0', '4', '4'] 

但是,這確實是一個正則表達式的任務,您應該嘗試瞭解基礎知識。

import re 

def find_between(s, word): 
    # Search for your (word followed by ((:a_digit) repeated three times)) 
    x = re.search("(%s(:\d+){3})" % word, s) 
    return x.groups()[0] 

for word in l: 
    print find_between(s, word).split(':', 1)[-1].split(':') 

這將打印

['0', '0', '0'] 
['0', '74', '74'] 
['6', '1872', '1874'] 
['0', '2', '2'] 
['0', '2', '2'] 
['0', '0', '0'] 
['0', '1134', '1152'] 
['0', '4', '4'] 
0

這將讓你所有的羣體,提供的字符串始終能形成:

re.findall('(\w+):(\d+):(\d+):(\d+)', s) 

這也得到了的時候,你可以很容易地從列表中刪除。

或者你可以使用字典解析來組織項目:

matches = re.findall('(\w+):(\d+:\d+:\d+)', s) 
my_dict = {k : v.split(':') for k, v in matches[1:]} 

我以前matches[1:]這裏擺脫了虛假匹配。如果你知道它永遠在那裏,你可以做到這一點。

0

檢查:

import re 
s = '04/03 23:50:06:242[76:Health]: (mem=188094936/17146904576) Queue Size[=:+:-] : Core[Compiler:0:0:0,HighPriority:0:74:74,Default:6:1872:1874,LowPriority:0:2:2]:Special[Special:0:2:2]:Event[Event:0:0:0]:Comm[CommHigh:0:1134:1152,CommDefault:0:4:4]' 
search = ['Compiler', 'HighPriority', 'Default', 'LowPriority', 'Special', 'Event', 'CommHigh', 'CommDefault'] 
data = [] 
for x in search: 
    data.append(re.findall(x+':([0-9]+:[0-9]+:[0-9]+)', s)) 

data = [map(lambda x: x.split(':'), x) for x in data] # remove : 
data = [x[0] for x in data] # remove unnecessary [] 
data = [map(int,x) for x in data] # convert to int 
print data 

>>>[[0, 0, 0], [0, 74, 74], [6, 1872, 1874], [0, 2, 2], [0, 2, 2], [0, 0, 0], [0, 1134, 1152], [0, 4, 4]]