2016-04-27 46 views
1

示例文本Python的正則表達式來找到子串

port0 interface GigabitEthernet0/4/0 

port1 interface TenGigabitEthernet0/1/0 

login delay 2 

bfd-template single-hop BDI 

ip ftp source-interface Loopback0 

ip tftp source-interface Loopback0 

interface Loopback0 

interface Loopback100 

interface Loopback999 

description *** Loopback interface for management *** 

interface TenGigabitEthernet0/0/0 

mtu 9216 

carrier-delay msec 0 

interface TenGigabitEthernet0/1/0 

mtu 9216 

carrier-delay msec 0 

interface GigabitEthernet0/4/0 

mtu 9216 

interface GigabitEthernet0/4/1 

我正則表達式是

[T][e]((?:.|\n)*?[e][c]\s\d+) 

和IM在pythex.org

驗證它,它符合以下 -

TenGigabitEthernet0/1/0 

mtu 9216 

carrier-delay msec 0 

這是我想要的。但它也匹配 -

TenGigabitEthernet0/1/0 

login delay 2 

bfd-template single-hop BDI 

ip ftp source-interface Loopback0 

ip tftp source-interface Loopback0 

interface Loopback0 

interface Loopback100 

interface Loopback999 

description *** Loopback interface for management *** 

interface TenGigabitEthernet0/0/0 

mtu 9216 

carrier-delay msec 0 

我不想要。我正在尋找一個multiline regex,它完全符合我的字符串中的所有tengig-mtu-carrier-delay part(s)

我所寫的是 -

buffer_=open(file,"rb") 
sb=buffer_.read().replace('\r\r\n','') 
inf = re.compile(r'[T][e]((?:.|\n)*?[e][c]\s\d+)') 
intf = inf.findall(sb) 
print intf 
buffer_.close() 

,它完全適用於已在連續的行tengig-mtu-carrier-delay文件,但不能如此完美。如果有其他地方還有其他tengig

+1

你可以突出你試過嗎? SO是編程過程中出現問題的地方,而不是爲你編寫正則表達式。 –

+0

如果你知道你想要匹配什麼,你爲什麼不匹配確切的字符串? –

+0

感謝亞歷山大 - 我也粘貼了我的代碼,並且正確的正則表達式的研究也將發佈答案,如果我得到它。 – user6259926

回答

0

我想這正則表達式是你想要什麼

(tengig.*?(?:\n+)?\bmtu\b.*?(?:\n+)?\bcarrier-delay\b[^\n]+) 

Regex Demo

正則表達式擊穿

(#Capturing group 
tengig #Match tengig literally 
.*? #Lazy matching to meet next next requirement 
(?:\n+)? #Match next \n (OPTIONAL) 
\bmtu\b #Match mtu literally 
.*? #Lazy matching to meet next requirement 
(?:\n+)? #Match next \n (OPTIONAL) 
\bcarrier-delay\b #Match carrier-delay literally 
[^\n]+ #Match anything till a new line 
) #End capturing group 

Python代碼

p = re.compile(r'(tengig.*?(?:\n+)?\bmtu\b.*?(?:\n+)?\bcarrier-delay\b[^\n]+)', re.MULTILINE | re.IGNORECASE) 
test_str = "port0 interface GigabitEthernet0/4/0\n\nport1 interface TenGigabitEthernet0/1/0\n\nlogin delay 2\n\nbfd-template single-hop BDI\n\nip ftp source-interface Loopback0\n\nip tftp source-interface Loopback0\n\ninterface Loopback0\n\ninterface Loopback100\n\ninterface Loopback999\n\ndescription * Loopback interface for management *\n\ninterface TenGigabitEthernet0/0/0\n\nmtu 9216\n\ncarrier-delay msec 0\n\ninterface TenGigabitEthernet0/1/0\n\nmtu 9216\n\ncarrier-delay msec 0\n\ninterface GigabitEthernet0/4/0\n\nmtu 9216\n\ninterface GigabitEthernet0/4/1\n" 

Ideone Demo

+0

這工作正常,我想要什麼,但 - 不工作時,當我讀取具有text_str的文件到緩衝區,然後嘗試打印re.findall inf = re.compile(r'(tengig。*?(?:\ n +)?\ bmtu \ b。*?(?:\ n +)?\ bcarrier-delay \ b [^ \ n] +)',re.MULTILINE | re.IGNORECASE) \t buffer_ = open(file,「rb 「).read() \t print(re。的findall(INF,buffer_))輸出只是[]爲每個文件迭代 [] [] [] [] [] [] [] [] [] [] [] [] [] – user6259926

+0

@ user6259926其工作正常對我來說..對於我使用的單個文件** [this](http://ideone.com/7mtY7a)**代碼 – rock321987

+0

是的,我讀了整個文件在一個走。 for file in os.listdir(currdir):inf = re.compile(ur'(tengig。*?(?:\ n +)?\ bmtu \ b。*?(?:\ n +)?\ bcarrier-delay \ b [^ \ n] +)」,re.MULTILINE | re.IGNORECASE) \t \t = buffer_打開(文件, 「RB」)讀() \t INTF =(re.findall(INF,buffer_))打印。 intf – user6259926