我有以下的Perl代碼:全球正則表達式匹配掛
# $content is the text of a webpage
while ($content =~ /rgRow.*?<td>(.*?)<\/td><td.*?>(.*?)<\/td><td.*?>(.*?)<\/td><td.*?>.*?<\/td><td.*?>(.*?)<\/td><td.*?><nobr>(.*?)<\/nobr><\/td>/sg) {
# do stuff
}
我曾指出,該代碼是掛在這個表達式調用。它會在while循環中進行2-3次迭代,然後它會掛起。我已經離開了大約30分鐘,並沒有繼續。
可能是什麼問題?
該代碼的目的是通過一些HTML並從中提取一些數據。
這裏是我設置$content
到HTML:
<tbody>
<tr class="rgRow InnerItemStyle" id="ctl00_PlaceHolderMain_radResultsGrid_ctl00__0">
<td>CONSIDERATION OF REPORTS SUBMITTED BY STATES PARTIES UNDER ARTICLE 9 OF THE CONVENTION : SECOND PERIODIC REPORT OF STATES PARTIES DUE IN 1974/MOROCCO</td><td>State party's report</td><td>CERD</td><td>Morocco</td><td>CERD/C/R.65/Add.1</td><td><nobr>21 Feb 1974</nobr></td><td>
<a id="ctl00_PlaceHolderMain_radResultsGrid_ctl00_ctl04_MoreDocs" title="View document" href="http://tbinternet.ohchr.org/_layouts/treatybodyexternal/Download.aspx?symbolno=CERD%2fC%2fR.65%2fAdd.1&Lang=en" target="_blank" style="text-decoration:underline;">View document</a>
</td><td style="display:none;"> </td><td style="display:none;"> </td><td style="display:none;"> </td><td style="display:none;"> </td><td style="display:none;"> </td><td style="display:none;"> </td><td style="display:none;">E</td><td style="display:none;"> </td><td style="display:none;"> </td><td style="display:none;"> </td><td style="display:none;"> </td><td style="display:none;"> </td><td style="display:none;"> </td><td style="display:none;"> </td><td style="display:none;"> </td><td style="display:none;"> </td><td style="display:none;"> </td><td style="display:none;"> </td><td style="display:none;">CERD/C/R.65/Add.1</td><td style="display:none;"> </td><td style="display:none;">True</td>
</tr><tr class="rgRow InnerAlernatingItemStyle" id="ctl00_PlaceHolderMain_radResultsGrid_ctl00__1">
<td>CONSIDERATION OF REPORTS SUBMITTED BY STATES PARTIES UNDER ARTICLE 9 OF THE CONVENTION : INITIAL REPORTS OF STATES PARTIES WHICH ARE DUE IN 1972/MOROCCO</td><td>State party's report</td><td>CERD</td><td>Morocco</td><td>CERD/C/R.33/Add.1</td><td><nobr>17 Jan 1972</nobr></td><td>
<a id="ctl00_PlaceHolderMain_radResultsGrid_ctl00_ctl06_MoreDocs" title="View document" href="http://tbinternet.ohchr.org/_layouts/treatybodyexternal/Download.aspx?symbolno=CERD%2fC%2fR.33%2fAdd.1&Lang=en" target="_blank" style="text-decoration:underline;">View document</a>
</td><td style="display:none;"> </td><td style="display:none;"> </td><td style="display:none;"> </td><td style="display:none;"> </td><td style="display:none;"> </td><td style="display:none;"> </td><td style="display:none;">E</td><td style="display:none;"> </td><td style="display:none;"> </td><td style="display:none;"> </td><td style="display:none;"> </td><td style="display:none;"> </td><td style="display:none;"> </td><td style="display:none;"> </td><td style="display:none;"> </td><td style="display:none;"> </td><td style="display:none;"> </td><td style="display:none;"> </td><td style="display:none;">CERD/C/R.33/Add.1</td><td style="display:none;"> </td><td style="display:none;">True</td>
</tr><tr class="rgRow InnerItemStyle" id="ctl00_PlaceHolderMain_radResultsGrid_ctl00__2">
<td>Annex I to ALGERIA's Report</td><td>Annex to State party report</td><td>CERD</td><td>Algeria</td><td> </td><td> </td><td>
<a id="ctl00_PlaceHolderMain_radResultsGrid_ctl00_ctl08_MoreDocs" title="View document" href="http://tbinternet.ohchr.org/_layouts/treatybodyexternal/Download.aspx?symbolno=INT%2fCERD%2fAIS%2fDZA%2f13691&Lang=en" target="_blank" style="text-decoration:underline;">View document</a>
</td><td style="display:none;">E</td><td style="display:none;"> </td><td style="display:none;"> </td><td style="display:none;"> </td><td style="display:none;"> </td><td style="display:none;"> </td><td style="display:none;"> </td><td style="display:none;"> </td><td style="display:none;"> </td><td style="display:none;"> </td><td style="display:none;"> </td><td style="display:none;"> </td><td style="display:none;">INT_CERD_AIS_DZA_13691_E.doc</td><td style="display:none;"> </td><td style="display:none;"> </td><td style="display:none;"> </td><td style="display:none;"> </td><td style="display:none;"> </td><td style="display:none;">INT/CERD/AIS/DZA/13691</td><td style="display:none;"> </td><td style="display:none;">True</td>
</tr><tr class="rgRow InnerAlernatingItemStyle" id="ctl00_PlaceHolderMain_radResultsGrid_ctl00__3">
<td>Annex II to ALGERIA's report</td><td>Annex to State party report</td><td>CERD</td><td>Algeria</td><td> </td><td> </td><td>
<a id="ctl00_PlaceHolderMain_radResultsGrid_ctl00_ctl10_MoreDocs" title="View document" href="http://tbinternet.ohchr.org/_layouts/treatybodyexternal/Download.aspx?symbolno=INT%2fCERD%2fAIS%2fDZA%2f13692&Lang=en" target="_blank" style="text-decoration:underline;">View document</a>
</td><td style="display:none;">E</td><td style="display:none;"> </td><td style="display:none;"> </td><td style="display:none;"> </td><td style="display:none;"> </td><td style="display:none;"> </td><td style="display:none;"> </td><td style="display:none;"> </td><td style="display:none;"> </td><td style="display:none;"> </td><td style="display:none;"> </td><td style="display:none;"> </td><td style="display:none;">INT_CERD_AIS_DZA_13692_E.doc</td><td style="display:none;"> </td><td style="display:none;"> </td><td style="display:none;"> </td><td style="display:none;"> </td><td style="display:none;"> </td><td style="display:none;">INT/CERD/AIS/DZA/13692</td><td style="display:none;"> </td><td style="display:none;">True</td>
</tr><tr class="rgRow InnerItemStyle" id="ctl00_PlaceHolderMain_radResultsGrid_ctl00__4">
<td>Annex III to ALGERIA's report</td><td>Annex to State party report</td><td>CERD</td><td>Algeria</td><td> </td><td> </td><td>
<a id="ctl00_PlaceHolderMain_radResultsGrid_ctl00_ctl12_MoreDocs" title="View document" href="http://tbinternet.ohchr.org/_layouts/treatybodyexternal/Download.aspx?symbolno=INT%2fCERD%2fAIS%2fDZA%2f13693&Lang=en" target="_blank" style="text-decoration:underline;">View document</a>
</td><td style="display:none;">E</td><td style="display:none;"> </td><td style="display:none;"> </td><td style="display:none;"> </td><td style="display:none;"> </td><td style="display:none;"> </td><td style="display:none;"> </td><td style="display:none;"> </td><td style="display:none;"> </td><td style="display:none;"> </td><td style="display:none;"> </td><td style="display:none;"> </td><td style="display:none;">INT_CERD_AIS_DZA_13693_E.doc</td><td style="display:none;"> </td><td style="display:none;"> </td><td style="display:none;"> </td><td style="display:none;"> </td><td style="display:none;"> </td><td style="display:none;">INT/CERD/AIS/DZA/13693</td><td style="display:none;"> </td><td style="display:none;">True</td>
</tr><tr class="rgRow InnerAlernatingItemStyle" id="ctl00_PlaceHolderMain_radResultsGrid_ctl00__5">
<td>CERD-C-NZ-18-20_Annexes</td><td>Annex to State party report</td><td>CERD</td><td>New Zealand</td><td> </td><td> </td><td>
<a id="ctl00_PlaceHolderMain_radResultsGrid_ctl00_ctl14_MoreDocs" title="View document" href="http://tbinternet.ohchr.org/_layouts/treatybodyexternal/Download.aspx?symbolno=INT%2fCERD%2fADR%2fNZL%2f13731&Lang=en" target="_blank" style="text-decoration:underline;">View document</a>
</td><td style="display:none;">E</td><td style="display:none;"> </td><td style="display:none;"> </td><td style="display:none;"> </td><td style="display:none;"> </td><td style="display:none;"> </td><td style="display:none;"> </td><td style="display:none;"> </td><td style="display:none;"> </td><td style="display:none;"> </td><td style="display:none;"> </td><td style="display:none;"> </td><td style="display:none;">INT_CERD_ADR_NZL_13731_E.doc</td><td style="display:none;"> </td><td style="display:none;"> </td><td style="display:none;"> </td><td style="display:none;"> </td><td style="display:none;"> </td><td style="display:none;">INT/CERD/ADR/NZL/13731</td><td style="display:none;"> </td><td style="display:none;">True</td>
</tr><tr class="rgRow InnerItemStyle" id="ctl00_PlaceHolderMain_radResultsGrid_ctl00__6">
<td>CERD.C.RUS.20-22_Annex1</td><td>Annex to State party report</td><td>CERD</td><td>Russian Federation</td><td> </td><td> </td><td>
<a id="ctl00_PlaceHolderMain_radResultsGrid_ctl00_ctl16_MoreDocs" title="View document" href="http://tbinternet.ohchr.org/_layouts/treatybodyexternal/Download.aspx?symbolno=INT%2fCERD%2fADR%2fRUS%2f13732&Lang=en" target="_blank" style="text-decoration:underline;">View document</a>
</td><td style="display:none;"> </td><td style="display:none;"> </td><td style="display:none;"> </td><td style="display:none;"> </td><td style="display:none;"> </td><td style="display:none;">R</td><td style="display:none;"> </td><td style="display:none;"> </td><td style="display:none;"> </td><td style="display:none;"> </td><td style="display:none;"> </td><td style="display:none;"> </td><td style="display:none;"> </td><td style="display:none;"> </td><td style="display:none;"> </td><td style="display:none;"> </td><td style="display:none;"> </td><td style="display:none;">INT_CERD_ADR_RUS_13732_R.doc</td><td style="display:none;">INT/CERD/ADR/RUS/13732</td><td style="display:none;"> </td><td style="display:none;">True</td>
</tr><tr class="rgRow InnerAlernatingItemStyle" id="ctl00_PlaceHolderMain_radResultsGrid_ctl00__7">
<td>Annex to State party report</td><td>Annex to State party report</td><td>CERD</td><td>Poland</td><td> </td><td> </td><td>
<a id="ctl00_PlaceHolderMain_radResultsGrid_ctl00_ctl18_MoreDocs" title="View document" href="http://tbinternet.ohchr.org/_layouts/treatybodyexternal/Download.aspx?symbolno=INT%2fCERD%2fADR%2fPOL%2f15432&Lang=en" target="_blank" style="text-decoration:underline;">View document</a>
</td><td style="display:none;">E</td><td style="display:none;"> </td><td style="display:none;"> </td><td style="display:none;"> </td><td style="display:none;"> </td><td style="display:none;"> </td><td style="display:none;"> </td><td style="display:none;"> </td><td style="display:none;"> </td><td style="display:none;"> </td><td style="display:none;"> </td><td style="display:none;"> </td><td style="display:none;">INT_CERD_ADR_POL_15432_E.doc</td><td style="display:none;"> </td><td style="display:none;"> </td><td style="display:none;"> </td><td style="display:none;"> </td><td style="display:none;"> </td><td style="display:none;">INT/CERD/ADR/POL/15432</td><td style="display:none;"> </td><td style="display:none;">True</td>
</tr><tr class="rgRow InnerItemStyle" id="ctl00_PlaceHolderMain_radResultsGrid_ctl00__8">
<td>Annexe X</td><td>Annex to State party report</td><td>CERD</td><td>Belgium</td><td> </td><td> </td><td>
<a id="ctl00_PlaceHolderMain_radResultsGrid_ctl00_ctl20_MoreDocs" title="View document" href="http://tbinternet.ohchr.org/_layouts/treatybodyexternal/Download.aspx?symbolno=INT%2fCERD%2fADR%2fBEL%2f15561&Lang=en" target="_blank" style="text-decoration:underline;">View document</a>
</td><td style="display:none;"> </td><td style="display:none;">F</td><td style="display:none;"> </td><td style="display:none;"> </td><td style="display:none;"> </td><td style="display:none;"> </td><td style="display:none;"> </td><td style="display:none;"> </td><td style="display:none;"> </td><td style="display:none;"> </td><td style="display:none;"> </td><td style="display:none;"> </td><td style="display:none;"> </td><td style="display:none;">INT_CERD_ADR_BEL_15561_F.pdf</td><td style="display:none;"> </td><td style="display:none;"> </td><td style="display:none;"> </td><td style="display:none;"> </td><td style="display:none;">INT/CERD/ADR/BEL/15561</td><td style="display:none;"> </td><td style="display:none;">True</td>
</tr><tr class="rgRow InnerAlernatingItemStyle" id="ctl00_PlaceHolderMain_radResultsGrid_ctl00__9">
<td>Annexe XI</td><td>Annex to State party report</td><td>CERD</td><td>Belgium</td><td> </td><td> </td><td>
<a id="ctl00_PlaceHolderMain_radResultsGrid_ctl00_ctl22_MoreDocs" title="View document" href="http://tbinternet.ohchr.org/_layouts/treatybodyexternal/Download.aspx?symbolno=INT%2fCERD%2fADR%2fBEL%2f15562&Lang=en" target="_blank" style="text-decoration:underline;">View document</a>
</td><td style="display:none;"> </td><td style="display:none;">F</td><td style="display:none;"> </td><td style="display:none;"> </td><td style="display:none;"> </td><td style="display:none;"> </td><td style="display:none;"> </td><td style="display:none;"> </td><td style="display:none;"> </td><td style="display:none;"> </td><td style="display:none;"> </td><td style="display:none;"> </td><td style="display:none;"> </td><td style="display:none;">INT_CERD_ADR_BEL_15562_F.pdf</td><td style="display:none;"> </td><td style="display:none;"> </td><td style="display:none;"> </td><td style="display:none;"> </td><td style="display:none;">INT/CERD/ADR/BEL/15562</td><td style="display:none;"> </td><td style="display:none;">True</td>
</tr>
</tbody>
我想下面的行,看看它是如何去代替:
while ($content =~ m/rgRow.+?<td>(.+?)<\/td><td>(.+?)<\/td><td>(.+?)<\/td><td>(.+?)<\/td><td>(.+?)<\/td><td>(.+?)<\/td>/gs)
原始代碼是不是我的。
請顯示您正在嘗試解析的HTML。無論如何,正則表達式不是解析HTML的正確工具,爲什麼不使用HTML解析器? –
[需要閱讀的人試圖用正則表達式解析XML/HTML](http://stackoverflow.com/a/1732454/18157)。簡介:不要使用正則表達式解析HTML/XML,請使用適當的解析器。 –
同意上面的說法,但是如果你需要這樣做,那麼如何用'qr'打破這個討厭的陣容呢?看起來要容易得多。 – zdim