2012-07-07 56 views
3

使用perl我已經包含下面的文本大文件「咕嚕咕嚕」,我試圖捕捉到文件中的所有正則表達式匹配$1我給出的正則表達式。我的正則表達式是Perl的正則表達式啜食捕捉

=~ /((GET|PUT|POST|CONNECT).*?(Content-Type: (image\/jpeg)))/sgm 

目前以粗體顯示的文本被抓獲,然而,最後捕獲處理線

"GET /~sgtatham/putty/latest/x86/pscp.exe HTTP/1.1" to "Content-Type: text/html; charset=iso-8859-1" 

爲最後捕獲的一部分,它不應該B/C「 text/html「不等於我的正則表達式捕獲(image\/jpeg)。我希望能夠捕獲最後一次捕獲,而無需使用

"GET /~sgtatham/putty/latest/x86/pscp.exe HTTP/1.1" to "Content-Type: text/html; charset=iso-8859-1" being included. 

感謝任何幫助,謝謝。

**GET /~sgtatham/putty/latest/x86/pscp.exe HTTP/1.1 
Host: the.earth.li 
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:13.0) Gecko/20100101 Firefox/13.0 
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8 
Accept-Language: en-us,en;q=0.5 
Accept-Encoding: gzip, deflate 
Connection: Keep-Alive 
Content-Type: text/html; charset=iso-8859-1 
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN"> 
<html><head> 
\.+" 
GET /~sgtatham/putty/0.62/x86/pscp.exe HTTP/1.1 
Host: the.earth.li 
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:13.0) Gecko/20100101 Firefox/13.0 
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8 
Accept-Language: en-us,en;q=0.5 
Content-Length: 315392 
Keep-Alive: timeout=15, max=99 
Connection: Keep-Alive 
Content-Type: image/jpeg** 
Platform: Digital Engagement Platform; Version: 1.1.0.0 
+2

我不明白你想要捕捉什麼。你能告訴我們你期望輸出是什麼嗎? – 2012-07-07 05:46:46

回答

3

可以容易(?!pattern)做到這一點,這是一個負先行斷言。 對於回顧閱讀這篇文章Positive examples of positive and negative lookahead (ourcraft.wordpress.com)

正則表達式

$text =~/
(        # start capture 
    (?:GET|PUT|POST|CONNECT)  # start phrase 
    (?: 
     (?!GET|PUT|POST|CONNECT) # make sure we'havent any these phrase 
     .       # accept any character 
    )*?       # any number of times (not greedy) 
    Content-Type:\simage\/jpeg # end phrase 
)         # end capture 
/msx; 
print $1; 

所有出現

while($text =~ m/REGEXP/msxg) { 

    print $1; 
} 

輸出

GET /~sgtatham/putty/0.62/x86/pscp.exe HTTP/1.1 
Host: the.earth.li 
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:13.0) Gecko/20100101  Firefox/13.0 
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8 
Accept-Language: en-us,en;q=0.5 
Content-Length: 315392 
Keep-Alive: timeout=15, max=99 
Connection: Keep-Alive 
Content-Type: image/jpeg 
+0

謝謝fxzuz爲您發佈,這看起來接近我想要的東西。這個解決方案是否匹配在一個slurp文件中的正則表達式捕獲的所有發生或僅匹配第一個/最後一個,然後退出?我需要匹配所有發生在大型slurp文件中的任何地方。謝謝 – user1508213 2012-07-07 19:03:49

+0

是當然的,只需添加g(全局),以您的正則表達式和循環處理它。我已經添加了這段代碼來回答。 – 2012-07-07 20:41:16

+0

謝謝,我會測試一下。再次感謝! – user1508213 2012-07-07 21:04:56