最佳方法在Python

-2

我需要指出在這個問題的正確方向查找多個字符串紀錄我的工作：最佳方法在Python

比方說，我是從一個C程序讀取輸出如下：

while True: 
    ln = p.stdout.readline() 
    if '' == ln: 
     break 
    #do stuff here with ln

而且我的輸出看起來像這行線：

TrnIq: Thread on CPU 37 
TrnIq: Thread on CPU 37 but will be moved to CPU 44 
IP-Thread on CPU 33 
FANOUT Thread on CPU 37 
Filter-Thread on CPU 38 but will be moved to CPU 51 
TRN TMR Test 2 Supervisor Thread on CPU 34 
HomographyWarp Traking Thread[0] on CPU 26

我想捕捉「TrnIq：線程」和「37」作爲兩個獨立的變量：一個字符串和一個數字從輸出「TrnIq ：在CPU上線程37" 。

對於其他行，例如捕獲「HomographyWarp Traking Thread [0] on」和＃「26」，來自「在CPU 26上的HomographyWarp Traking Thread [0]」，它相當不錯。

唯一真正的挑戰是這樣的行：「在CPU 38上的過濾器線程，但將被移動到CPU 51」，我需要「Filer-Thread」和＃「51」不是第一個＃「 38" 。

Python有很多不同的方法來做到這一點我甚至不知道從哪裏開始！

在此先感謝！

來源

2012-07-20 NASA Intern

「Thanks」for unaccaptance ... – 2012-07-24 20:15:10

正則表達式在這裏似乎有點小題大做了我。 [免責聲明：我不喜歡正則表達式，但喜歡使用Python，所以儘可能用Python編寫，不要寫正則表達式。出於我從未完全理解的原因，這被認爲是令人驚訝的。]

s = """TrnIq: Thread on CPU 37 
TrnIq: Thread on CPU 37 but will be moved to CPU 44 
IP-Thread on CPU 33 
FANOUT Thread on CPU 37 
Filter-Thread on CPU 38 but will be moved to CPU 51 
TRN TMR Test 2 Supervisor Thread on CPU 34 
HomographyWarp Traking Thread[0] on CPU 26""" 

for line in s.splitlines(): 
    words = line.split() 
    if not ("CPU" in words and "on" in words): continue # skip uninteresting lines 
    prefix_words = words[:words.index("on")+1] 
    prefix = ' '.join(prefix_words) 
    cpu = int(words[-1]) 
    print (prefix, cpu)

給

('TrnIq: Thread on', 37) 
('TrnIq: Thread on', 44) 
('IP-Thread on', 33) 
('FANOUT Thread on', 37) 
('Filter-Thread on', 51) 
('TRN TMR Test 2 Supervisor Thread on', 34) 
('HomographyWarp Traking Thread[0] on', 26)

，我不認爲我需要翻譯此代碼的任何成英文。

來源

2012-07-20 17:08:10 DSM

是的！即使只是看着它，我也能理解！比正則表達式簡單得多！但你認爲哪種方式更有效？ – 2012-07-20 17:22:30

什麼是效率？我認爲它是一種總體時間 - 寫作，時間調試，時間運行，修改時間來處理我沒有預測的情況 - 來獲得我需要的輸出。正則表達式通常會（並非總是）在性能上獲勝。我認爲他們只在代碼方面贏得我很少發現自己的用例 - 對於基本工具來說有點太複雜，並且不夠複雜，無法證明使用真正的解析器是正確的 - 但意見和情況各不相同。 – DSM 2012-07-20 17:36:55

我不斷收到「on」上的錯誤未找到？ – 2012-07-20 17:47:40

以下應返回的信息假定ln一個元組數據的一個單一的線（編輯爲包括CPU值轉換爲int）：

match = re.match(r'(.*?)(?: on CPU.*)?(?: (?:on|to) CPU)(.*)', ln).groups() 
if match: 
    proc, cpu = match.groups() 
    cpu = int(cpu)

實施例：

>>> import re 
>>> for ln in lines: 
...  print re.match(r'(.*?)(?: on CPU.*)?(?: (?:on|to) CPU)(.*)', ln).groups() 
... 
('TrnIq: Thread', '37') 
('TrnIq: Thread', '44') 
('IP-Thread', '33') 
('FANOUT Thread', '37') 
('Filter-Thread', '51') 
('TRN TMR Test 2 Supervisor Thread', '34') 
('HomographyWarp Traking Thread[0]', '26')

說明：

(.*?)   # capture zero or more characters at the start of the string, 
       # as few characters as possible 
(?: on CPU.*)? # optionally match ' on CPU' followed by any number of characters, 
       # do not capture this 
(?: (?:on|to) CPU) # match ' on CPU ' or ' to CPU ', but don't capture 
(.*)   # capture the rest of the line

Rubular：http://www.rubular.com/r/HqS9nGdmbM

來源

2012-07-20 16:53:03

因此，這將返回一個包含每行2個字符串的元組？如果我想將#s轉換爲字符串，python是否有strToNum函數？ – 2012-07-20 17:07:35

@NASAIntern - 您可以將'。*'結尾改爲'\ d +'，這樣您就可以不用抓取剩下的行，而只需要抓取數字。然後你可以使用'int（）'內置函數將字符串轉換爲數字。 – 2012-07-20 17:13:26

爲什麼downvote？ – 2012-07-20 17:27:19

因此，使用正則表達式^(.*?)\s+on\s+CPU.*(?<=\sCPU)\s+(\d+)\s*$

import sys 
import re 

for ln in sys.stdin: 
    m = re.match(r'^(.*?)\s+on\s+CPU.*(?<=\sCPU)\s+(\d+)\s*$', ln); 
    if m is not None: 
    print m.groups();

見並測試實例here。

來源

2012-07-20 16:54:26

我想我應該澄清的事實，這些只是我感興趣的輸出線。有成千上萬的其他行： _process_trn_ip_rslts：切換到TRN_FILTER_PROPAGATING狀態。 trn_filter：total_update_timer = 0.057454秒。 trn_ib-> trn_ib_state.ip_part = 1個 DISPOSITION ACCEPT VALUE = 2 -a * _000004.pgm 配置ACCEPT發送到插座6 TrnIb框架4：發送圖像ID =（1003080551，750074，framecnt 4）經由插口IP = IB。 open_sock/bind OK，sock = 79 create_cmd_sock/listen OK，erc = 0 trn_filter，instance 3：socket_from_cmd：35_ – 2012-07-20 18:18:19

@NASAIntern - 如果您需要打印整行，只需將'print m.groups（）;'替換爲'print ln;' - Ωmega5分鐘前 – 2012-07-20 18:39:44

在你所提到的情況下，你總是希望第二個CPU數量，因此它可以用一個正則表達式來完成：

# Test program 
import re 

lns = [ 
    "TrnIq: Thread on CPU 37", 
    "TrnIq: Thread on CPU 37 but will be moved to CPU 44", 
    "IP-Thread on CPU 33", 
    "FANOUT Thread on CPU 37", 
    "Filter-Thread on CPU 38 but will be moved to CPU 51", 
    "TRN TMR Test 2 Supervisor Thread on CPU 34", 
    "HomographyWarp Traking Thread[0] on CPU 26" 
] 

for ln in lns: 
    test = re.search("(?P<process>.*Thread\S* on).* CPU (?P<cpu>\d+)$", ln) 
    print "%s: '%s' on CPU #%s" % (ln, test.group('process'), test.group('cpu'))

在也許你想情況加以區分一般情況下（如線程一個CPU，移動線程，子線程...）。要做到這一點，您可以一個接一個地使用多個re.search（）。例如：

# This search recognizes lines of the form "...Thread on CPU so-and-so", and 
# also lines that add "...but will be moved to CPU some-other-cpu". 
test = re.search("(?P<process>.* Thread) on CPU (?P<cpu1>\d+)(but will be moved to CPU (?P<cpu2>\d+))*", ln) 
if test: 
    # Here we capture Process Thread, both moved and non moved 
    if test.group('cpu2'): 
     # We have process, cpu1 and cpu2: moved thread 
    else: 
     # Nonmoved task, we have test.group('process') and cpu1. 
else: 
    # No match, try some other regexp. For example processes with a thread number 
    # between square brackets: "Thread[0]", which are not captured by the regex above. 
    test = re.search("(?P<process>.*) Thread[(?P<thread>\d+)] on CPU (?P<cpu1>)", ln) 
    if test: 
     # Here we have Homography Traking in process, 0 in thread, 26 in cpu1

爲了獲得最佳性能，對於頻率更高的線路的測試最好先完成。

來源

2012-07-20 16:55:26 LSerni

我不確定我是否理解了第二部分「你可以用幾個re.search（）來做到這一點，例如：」我會用什麼？ – 2012-07-20 18:07:09

我想我應該澄清的事實，這些只是我感興趣的輸出線。有成千上萬的其他行：_process_trn_ip_rslts：切換到TRN_FILTER_PROPAGATING狀態。 trn_filter：total_update_timer = 0.057454 seconds.trn_ib-> trn_ib_state.ip_part = 1配置接受值= 2 -a * _000004.pgm處理髮送到套接字6的ACCEPT幀4 TrnIb：將圖像id =（1003080551,750074，framecnt 4）發送到IP通過套接字= 8從IB。 OK，sock = 79 create_cmd_sock/listen OK，erc = 0 trn_filter，instance 3：socket_from_cmd：35_ – 2012-07-20 18:23:33

好吧，那麼對於你讀的每一個，你都會使用re.search（）檢查它對幾個正則表達式之一。我提供的第一個.search會識別「......線程......在...... CPU」之類的行。其他搜索更有針對性，效率更高。如果你有一條你不感興趣的非常普通的線路，你也可以嘗試識別它，以便丟棄它並保存後續的比較。 – LSerni 2012-07-20 21:42:27

可以有兩個正則表達式搜索非常簡單地完成：

import re 

while True: 
    ln = p.stdout.readline() 
    if '' == ln: 
     break 

    start_match = re.search(r'^(.*?) on', ln) 
    end_match = re.search(r'(\d+)$', ln) 
    process = start_match and start_match.group(0) 
    process_number = end_match and end_match.group(0)

來源

2012-07-20 17:00:06 mVChr

你能提供一些細節嗎？我仍然習慣於這種有點東西的Python語法？ – 2012-07-20 17:05:14

您可以閱讀Python的正則表達式模塊文檔：http://docs.python.org/library/re.html – mVChr 2012-07-20 17:20:10

我得到正則表達式，而不是「and」和match.group（）函數。 matchgroup（）返回一個字符串吧？ – 2012-07-20 17:32:42

最佳方法在Python

回答

相關問題