2017-08-28 30 views
0

我有HTML字符串類似如下:環繞大括號XML標籤內的每個字匹配

<whatevertag do-not-change-this="word" or-this-word=""> 
    these words should be replaced with a word inside braces, 
    and also the same word thing for 
    <whatevertag> 
     the nested tags that has the word 
    </whatevertag> 
</whatevertag> 

我試圖做這樣的輸出:

<whatevertag do-not-change-this="word" or-this-word=""> 
    these {word}s should be replaced with a {word} inside braces, 
    and also the same {word} thing for 
    <whatevertag> 
     the nested tags that has the {word} 
    </whatevertag> 
</whatevertag> 

我已經試過這表達式(>[^>]*?)(word)([^<]*?<)和替換我已經使用$1{$2}$3 ..令人驚訝的(至少對我來說)它只適用於第一場比賽,輸出是:

<whatevertag do-not-change-this="word" or-this-word=""> 
    these {word}s should be replaced with a word inside braces, 
    and also the same word thing for 
    <whatevertag> 
     the nested tags that has the {word} 
    </whatevertag> 
</whatevertag> 

爲什麼會發生這種情況。以及如何解決它?

回答

2

的原因,你的正則表達式是unsuccessfull是:

(>[^>]*?)     # read '>', then lazily any character except '>' 
(word)      # until you encounter 'word' 
([^<]*?<)     # then lazily read any character except '<' until you find a '<' 

所以,當你捕獲「字」您正則表達式讀取,直到第一個「<」。這就是爲什麼第二個'單詞'沒有被捕獲。

什麼,你可以使用方法是:

(?:(?!word).)+(word) 

說明:

(?:       # Do not capture 
(?!word).)+     # Negative lookahead for word. Read 1 char 
(word)      # until you find 'word' 

example

編輯:重讀你的問題,你說清楚,你想捕捉以外的一切the「tags。看看: example 2

的正則表達式是:

((?!word)[^>])+(word)([^<]+) # read all characters, except 
          # '>' until you encounter 'word' 
          # read 'word' 
          # capture all following characters, except '<' 
+0

會出現一個問題,如果你使用參數的標籤,將有「字」像'<標籤的一些屬性=「字」>'我需要它只是更換這就是爲什麼我使用我使用的表達式......我真的不知道誰減去我的問題,爲什麼呢? – Husamuddin

+0

讓我來解決一個例子。我沒有降低你的問題。如果有人回答一個答案或問題,應該給出解釋。恕我直言。 –

+0

這將是怎麼樣的替代?如果你提供了,我將非常感激。 – Husamuddin