Perl的正則表達式啜食捕捉

使用perl我已經包含下面的文本大文件「咕嚕咕嚕」，我試圖捕捉到文件中的所有正則表達式匹配$1我給出的正則表達式。我的正則表達式是Perl的正則表達式啜食捕捉

=~ /((GET|PUT|POST|CONNECT).*?(Content-Type: (image\/jpeg)))/sgm

目前以粗體顯示的文本被抓獲，然而，最後捕獲處理線

"GET /~sgtatham/putty/latest/x86/pscp.exe HTTP/1.1" to "Content-Type: text/html; charset=iso-8859-1"

爲最後捕獲的一部分，它不應該B/C「 text/html「不等於我的正則表達式捕獲(image\/jpeg)。我希望能夠捕獲最後一次捕獲，而無需使用

"GET /~sgtatham/putty/latest/x86/pscp.exe HTTP/1.1" to "Content-Type: text/html; charset=iso-8859-1" being included.

感謝任何幫助，謝謝。

**GET /~sgtatham/putty/latest/x86/pscp.exe HTTP/1.1 
Host: the.earth.li 
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:13.0) Gecko/20100101 Firefox/13.0 
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8 
Accept-Language: en-us,en;q=0.5 
Accept-Encoding: gzip, deflate 
Connection: Keep-Alive 
Content-Type: text/html; charset=iso-8859-1 
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN"> 
<html><head> 
\.+" 
GET /~sgtatham/putty/0.62/x86/pscp.exe HTTP/1.1 
Host: the.earth.li 
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:13.0) Gecko/20100101 Firefox/13.0 
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8 
Accept-Language: en-us,en;q=0.5 
Content-Length: 315392 
Keep-Alive: timeout=15, max=99 
Connection: Keep-Alive 
Content-Type: image/jpeg** 
Platform: Digital Engagement Platform; Version: 1.1.0.0

來源

2012-07-07 user1508213

我不明白你想要捕捉什麼。你能告訴我們你期望輸出是什麼嗎？ – 2012-07-07 05:46:46

可以容易(?!pattern)做到這一點，這是一個負先行斷言。對於回顧閱讀這篇文章Positive examples of positive and negative lookahead (ourcraft.wordpress.com)

正則表達式

$text =~/
(        # start capture 
    (?:GET|PUT|POST|CONNECT)  # start phrase 
    (?: 
     (?!GET|PUT|POST|CONNECT) # make sure we'havent any these phrase 
     .       # accept any character 
    )*?       # any number of times (not greedy) 
    Content-Type:\simage\/jpeg # end phrase 
)         # end capture 
/msx; 
print $1;

所有出現

while($text =~ m/REGEXP/msxg) { 

    print $1; 
}

輸出

GET /~sgtatham/putty/0.62/x86/pscp.exe HTTP/1.1 
Host: the.earth.li 
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:13.0) Gecko/20100101  Firefox/13.0 
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8 
Accept-Language: en-us,en;q=0.5 
Content-Length: 315392 
Keep-Alive: timeout=15, max=99 
Connection: Keep-Alive 
Content-Type: image/jpeg

來源

2012-07-07 06:29:20

謝謝fxzuz爲您發佈，這看起來接近我想要的東西。這個解決方案是否匹配在一個slurp文件中的正則表達式捕獲的所有發生或僅匹配第一個/最後一個，然後退出？我需要匹配所有發生在大型slurp文件中的任何地方。謝謝 – user1508213 2012-07-07 19:03:49

是當然的，只需添加g（全局），以您的正則表達式和循環處理它。我已經添加了這段代碼來回答。 – 2012-07-07 20:41:16

謝謝，我會測試一下。再次感謝！ – user1508213 2012-07-07 21:04:56

Perl的正則表達式啜食捕捉

回答

相關問題