2012-05-27 33 views
1

腳本:正則表達式表現意外

import re 

matches = ['hello', 'hey', 'hi', 'hiya'] 

def check_match(string): 
    for item in matches: 
     if re.search(item, string): 
      print 'Match found: ' + string 
     else: 
      print 'Match not found: ' + string 

check_match('hey') 
check_match('hello there') 
check_match('this should not match') 
check_match('oh, hiya') 

輸出:

Match not found: hey 
Match found: hey 
Match not found: hey 
Match not found: hey 
Match found: hello there 
Match not found: hello there 
Match not found: hello there 
Match not found: hello there 
Match not found: this should not match 
Match not found: this should not match 
Match found: this should not match 
Match not found: this should not match 
Match not found: oh, hiya 
Match not found: oh, hiya 
Match found: oh, hiya 
Match found: oh, hiya 

有各種各樣的事情,我不明白,對於初學者來說,每個字符串搜索四次在此輸出,一些返回2作爲一個找到的匹配,有三個。我不確定導致這種情況發生的代碼中有什麼問題,但有人可以嘗試查看哪些問題?

的預期結果會是這樣:

Match found: hey 
Match found: hello there 
Match not found: this should not match 
Match found: oh, hiya 
+0

你搭配什麼正則表達式反對? –

回答

0

你得到4個檢索和4個輸出的每一個,因爲您是通過數組循環,搜索和陣列中的輸出的東西每個元素...

5

它的行爲不正確,這是您對re.search(...)的誤解。

請參閱您的輸出後的評論:

Match not found: hey     # because 'hello' is not in 'hey' 
Match found: hey      # because 'hey' is in 'hey' 
Match not found: hey     # because 'hi' is not in 'hey' 
Match not found: hey     # because 'hiya' is not in 'hey' 

Match found: hello there    # because 'hello' is in 'hello there' 
Match not found: hello there   # because 'hey' is not in 'hello there' 
Match not found: hello there   # because 'hi' is not in 'hello there' 
Match not found: hello there   # because 'hiya' is not in 'hello there' 

Match not found: this should not match # because 'hello' is not in 'this should not match' 
Match not found: this should not match # because 'hey' is not in 'this should not match' 
Match found: this should not match  # because 'hi' is in 'this should not match' 
Match not found: this should not match # because 'hiya' is not in 'this should not match' 

Match not found: oh, hiya    # because 'hello' is not in 'oh, hiya' 
Match not found: oh, hiya    # because 'hey' is not in 'oh, hiya' 
Match found: oh, hiya     # because 'hi' is in 'oh, hiya' 
Match found: oh, hiya     # because 'hiya' is in 'oh, hiya' 

如果你不想爲模式hi匹配輸入oh, hiya的情況下,你應該環繞你的模式字邊界:

\bhi\b 

這將導致它只匹配發生的hi不是被其他字母包圍(well hiya there將不匹配模式\bhi\b,但是well hi there)。

0

for循環針對每個'匹配'檢查字符串,併爲每個匹配找到或未找到打印輸出。你真正想要的是查看是否匹配任何匹配,然後打印出單個「找到」或「未找到」。我其實不知道python,所以語法可能會關閉。

for item in matches: 
    if re.search(item, string): 
    found = true 
if found: 
    print 'Match found: ' + string 
else: 
    print 'Match not found: ' + string 

`

2

試試這個 - 它更簡潔,它會標誌了多個匹配:

import re 

matches = ['hello', 'hey', 'hi', 'hiya'] 

def check_match(string): 
    results = [item for item in matches if re.search(r'\b%s\b' % (item), string)] 
    print 'Found %s' % (results) if len(results) > 0 else "No match found" 

check_match('hey') 
check_match('hello there') 
check_match('this should not match') 
check_match('oh, hiya') 
check_match('xxxxx xxx') 
check_match('hello and hey') 

給出:

Found ['hey'] 
Found ['hello'] 
No match found 
Found ['hiya'] 
No match found 
Found ['hello', 'hey']