蟒蛇 - 在一個字符串

我試圖讓許多值了相當複雜的字符串，它看起來像這樣的循環，通過價值觀 -蟒蛇 - 在一個字符串

s = '04/03 23:50:06:242[76:Health]: (mem=188094936/17146904576) Queue Size[=:+:-] : Core[Compiler:0:0:0,HighPriority:0:74:74,Default:6:1872:1874,LowPriority:0:2:2]:Special[Special:0:2:2]:Event[Event:0:0:0]:Comm[CommHigh:0:1134:1152,CommDefault:0:4:4]'

這些都是我需要掃描的價值觀 -

list = ['Compiler', 'HighPriority', 'Default', 'LowPriority', 'Special', 'Event', 'CommHigh', 'CommDefault']

我的意圖是獲得每個字符串後面的3個數字，所以在HighPriority的例子中，我會得到[0, 74, 74]，然後我可以對每個項目執行一些操作。

我已經使用了下面的內容，但是它並沒有說明字符串的結尾不是逗號。

def find_between(s, first, last): 
    try: 
     start = s.index(first) + len(first) 
     end = s.index(last, start) 
     return s[start:end] 
    except ValueError: 
     return "" 


for l in list: 
    print l 
    print find_between(s, l + ':', ',').split(':')

來源

2016-04-05 whoisearth

我認爲最好的方法來解決這個問題就是要學會用模塊的標準庫的「重」。 – mkiever

是的，我的重複是可怕的。我嘗試過使用re，但是當我看到像'\ d \ w \ ++ \？\（\）'這樣的代碼塊時，我凍結了，因爲這對我來說並不容易： – whoisearth

像'r = re .search（'Compiler：（[0-9] +）：（[0-9] +）：（[0-9] +）'，s）'應該讓你開始使用'r.groups（）'以獲得包含數字的三個子字符串 – mkiever

編輯，如果你真的想避免正則表達式，你的方法適用於一個小調整（我改名list到l以避免陰影內建的類型）：

from itertools import takewhile 
from string import digits 

def find_between(s, first): 
    try: 
     start = s.index(first) + len(first) 
     # Keep taking the next character while it's either a ':' or a digit 
     # You can also just cast this into a list and forget about joining and later splitting. 
     # Also, consider storing ':'+digits in a variable to avoid recreating it all the time 
     return ''.join(takewhile(lambda char: char in ':'+digits, s[start:])) 
    except ValueError: 
     return "" 


for _ in l: 
    print _ 
    print find_between(s, _ + ':').split(':')

此打印：

Compiler 
['0', '0', '0'] 
HighPriority 
['0', '74', '74'] 
Default 
['6', '1872', '1874'] 
LowPriority 
['0', '2', '2'] 
Special 
['0', '2', '2'] 
Event 
['0', '0', '0'] 
CommHigh 
['0', '1134', '1152'] 
CommDefault 
['0', '4', '4']

但是，這確實是一個正則表達式的任務，您應該嘗試瞭解基礎知識。

import re 

def find_between(s, word): 
    # Search for your (word followed by ((:a_digit) repeated three times)) 
    x = re.search("(%s(:\d+){3})" % word, s) 
    return x.groups()[0] 

for word in l: 
    print find_between(s, word).split(':', 1)[-1].split(':')

這將打印

['0', '0', '0'] 
['0', '74', '74'] 
['6', '1872', '1874'] 
['0', '2', '2'] 
['0', '2', '2'] 
['0', '0', '0'] 
['0', '1134', '1152'] 
['0', '4', '4']

來源

2016-04-05 21:07:24 Bahrom

這將讓你所有的羣體，提供的字符串始終能形成：

re.findall('(\w+):(\d+):(\d+):(\d+)', s)

這也得到了的時候，你可以很容易地從列表中刪除。

或者你可以使用字典解析來組織項目：

matches = re.findall('(\w+):(\d+:\d+:\d+)', s) 
my_dict = {k : v.split(':') for k, v in matches[1:]}

我以前matches[1:]這裏擺脫了虛假匹配。如果你知道它永遠在那裏，你可以做到這一點。

來源

2016-04-05 20:53:47

檢查：

import re 
s = '04/03 23:50:06:242[76:Health]: (mem=188094936/17146904576) Queue Size[=:+:-] : Core[Compiler:0:0:0,HighPriority:0:74:74,Default:6:1872:1874,LowPriority:0:2:2]:Special[Special:0:2:2]:Event[Event:0:0:0]:Comm[CommHigh:0:1134:1152,CommDefault:0:4:4]' 
search = ['Compiler', 'HighPriority', 'Default', 'LowPriority', 'Special', 'Event', 'CommHigh', 'CommDefault'] 
data = [] 
for x in search: 
    data.append(re.findall(x+':([0-9]+:[0-9]+:[0-9]+)', s)) 

data = [map(lambda x: x.split(':'), x) for x in data] # remove : 
data = [x[0] for x in data] # remove unnecessary [] 
data = [map(int,x) for x in data] # convert to int 
print data 

>>>[[0, 0, 0], [0, 74, 74], [6, 1872, 1874], [0, 2, 2], [0, 2, 2], [0, 0, 0], [0, 1134, 1152], [0, 4, 4]]

來源

2016-04-05 21:07:18 Milor123

蟒蛇 - 在一個字符串

回答

相關問題