用正則表達式選擇HTML文本元素？

我想在HTML文檔中尋找©，基本上得到版權所歸屬的實體。用正則表達式選擇HTML文本元素？

著作權線顯示了兩種不同的方式：

<p class="bg-copy">&copy; 2011 The New York Times Company</p>

或

<a href="http://www.nytimes.com/ref/membercenter/help/copyright.html"> 
&copy; 2011</a> 
<a href="http://www.nytco.com/">The New York Times Company</a>

或

<br>Published since 1996<br>Copyright &copy; CounterPunch<br> 
All rights reserved.<br>

我想忽略日期和中間的標籤和剛剛獲得「紐約時報公司」或「反擊」。

我一直沒有找到很多有關JavaScript或JQuery使用正則表達式，雖然我覺得它可能會導致重大的麻煩。如果有更好的方法，請告訴我。

來源

2011-10-30 tarayani

測試不要使用正則表達式，而不是使用DOM樹找到你在找什麼。一些鏈接：http://www.howtocreate.co.uk/tutorials/javascript/dombasics – FailedDev

通常你會得到的迴應是 - 請不要使用正則表達式進行JS解析。使用JS解析器。問題是 - 你能嗎？ – ZenMaster

@FailedDev差點讓它... – ZenMaster

對於可靠的解決方案，您可能需要DOM導航和一些啓發式技術的組合。你的例子可以用正則表達式來解決，但是有更多的場景可能...

&copy;[\s\d]*(?:<\/.+?>[^>]*>)?([^<]*)

適用於你的三個樣本。但僅限於他們和類似的情況。

查看rubular

說明：

&copy; // copyright symbol 
[\s\d]* // followed by spaces or digits 
(?:</.+?>[^>]*>)? // maybe followed by a closing tag and another opening one 
([^<]*) // than match anything up to the next tag

參見如何在JavaScript中使用jQuery使用this答案。基本上，你可以使用匹配（/正則表達式/）功能：

var result = string.match(/&copy;[\s\d]*(?:<\/.+?>[^>]*>)?([^<]*)/)

來源

2011-10-30 19:48:46 morja

謝謝，我發現那是有效的，但我決定在頁面中找到「©」編碼並解析該元素。然而，現在我遇到了問題：http://stackoverflow.com/questions/8282250/jquery-contains-returns-nothing-for-html-encoding – tarayani

也，你會介意打破你的正則表達式嗎？我不太瞭解它。和我將如何在JavaScript中使用此？ – tarayani

請參閱我的更新。 – morja

$('*:contains(©)').filter(function(){ 
    return $(this).find('*:contains(©)').length == 0 
}).text();

這裏http://jsfiddle.net/unloco/kGPYA/

來源

2011-11-29 13:38:27 UnLoCo

用正則表達式選擇HTML文本元素？

回答

相關問題