2012-09-19 107 views
1

可能重複:
RegEx match open tags except XHTML self-contained tags如何匹配正則表達式的HTML標籤之外

我怎麼能比得上一些字母詞是外面的HTML標籤,而不是比賽的每字

例如:

<div id="mariano mariano mariano" nota="mariano/mariano">mariano was looking forward Mariano. I want to match this "Mariano" too. Mariano</div> 

在這個例子中,我想匹配標籤ID之外的所有「Mariano」。

我認爲這個問題的關鍵是期待在「>」之前的「<」並且匹配該單詞,但是如果正則表達式在「<」之前找到「>」,這意味着該單詞在標籤, 但我無法設法達到/產生一個正則表達式。

我試圖連接這個正則表達式(?<=^|>)[^><]+?(?=<|$)與另一個失敗。 而我最終的質量最低的解決方案是:

<!-- language: lang-js --> 
var searchFor = new RegExp("((!?<=^|>)" + termino + ")","ig"); 
var searchFor2 = new RegExp("(" + termino + "(?=<|$))","ig"); 
var searchFor3 = new RegExp("(!?<=^|[\\s\\.;,])" + termino + "(?=[\\s\\.;,]|$)","ig"); 

但那些3不覆蓋所有的替代品。

編輯:林使用JavaScript:

<script> 
container.find("p, span, div, .texto,").each(function() { 
var containerText = $(this).html(); 
for (var i = 0; i < terms.length; i++) { 
    var termino = terms[i]; 
    // 1st issue ">termino" was remplaced for: ">Pedro" 
    var searchFor = new RegExp("((!?<=^|>)" + termino + ")","ig"); 
    containerText = containerText.replace(searchFor,">Pedroedro"); 
    // 2nd issue "termino<" was remplaced for: "Pedro" 
    var searchFor2 = new RegExp("(" + termino + "(?=<|$))","ig"); 
    containerText = containerText.replace(searchFor2,"Pedro"); 
    // 3rd issue "[\.\s,;:]termino[\.\s,;:] 
    var searchFor3 = new RegExp("(!?<=^|[\\s\\.;,])" + termino + "(?=[\\s  \\.;,]|$)","ig"); 
    containerText = containerText.replace(searchFor3," Pedro"); 
}; 
$(this).html(containerText); 
}); 
</script> 
+2

[請不要試圖用正則表達式解析HTML(http://stackoverflow.com/a/1732454/451590) –

+0

給標記的一些例子,字符串,我們尋找。而文檔中的所有文本至少都在'body'元素內。 –

+0

正則表達式不是解析HTML的方法。請看一下http://htmlparsing.com的一些起點。 –

回答

1

有幾件事情 -

  1. 歡迎計算器!
  2. 請在詢問之前搜索問題。用正則表達式解析 xml有很多結果。
  3. 請勿使用正則表達式來解析xml/html! Try xpath

    var termino = // how ever you were defining before... 
    
    // Give me all divs, where the text content contains value of "termino" 
    var iterator = document.evaluate('//div/text()[contains(.,' + termino + ')]', documentNode, null, XPathResult.UNORDERED_NODE_ITERATOR_TYPE, null); 
    
    try { 
        // init thisNode to the first item in the iterator 
        var thisNode = iterator.iterateNext(); 
    
        // go through all items, alert their content (which should contain termino) 
        while (thisNode) { 
        alert(thisNode.textContent); 
        thisNode = iterator.iterateNext(); 
        } 
    } 
    catch (e) { 
        dump('Error: Document tree modified during iteration ' + e); 
    }