正則表達式來刪除HTML標籤

輸入：正則表達式來刪除HTML標籤

<td> 
<span> 
<span>spanaaa</span> 
<span class="1">spanbbb</span> 
<span class="" style="">spanccc</span> 
<span style="display:none">spanddd</span> 

<div>divaaa</div> 
<div class="1">divbbb</div> 
<div class="" style="">divccc</div> 
<div style="display:none">divddd</div> 
</span> 
</td>

我需要一個正明示或爲了獲取值沒有屬性的風格=的方法「顯示：無」

輸出：

spanaaa
spanbbb
spanccc

divaaa
divbbb
divccc

來源

2012-04-18 user1340363

[正則表達式解析什麼，但瑣碎的HTML時要使用一個非常糟糕的工具。（http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self -contained-tags） – 2012-04-18 05:22:44

您想要服務器端解決方案還是客戶端解決方案？ – 2012-04-18 05:26:49

你想怎麼做？一種方法是使用[xpath]（http://www.w3schools.com/xpath/） – 2012-04-18 05:37:53

正則表達式是這一個糟糕的選擇（因爲HTML變幻莫測的），但是你可以試試這個：

<div(?!\s*style="display:none")[^>]*>(.*?)</div>

來源

2012-04-18 05:41:53 Bohemian

「（？！=」是否爲錯字？）中的「=」因爲它不是預測本身的一部分（即「（？='或'（？！'），但它永遠不會與該位置上的文字'='匹配 – 2012-04-19 21:10:54

@AlanMoore我的不好！謝謝你選擇了這個，我在我的答案中糾正了正則表達式。 – Bohemian 2012-04-19 23:12:36

模式[.NET味]

(?<=<\w+ [^<>]*?\w+=")(?!display:none)(?<mt>[^"<>]+)(?=") 

Options:^and $ match at line breaks 

Assert that the regex below can be matched, with the match ending at this position (positive lookbehind) «(?<=<\w+ [^<>]*?\w+=")» 
    Match the character 「<」 literally «<» 
    Match a single character that is a 「word character」 (letters, digits, and underscores) «\w+» 
     Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+» 
    Match the character 「 」 literally « » 
    Match a single character NOT present in the list 「<>」 «[^<>]*?» 
     Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?» 
    Match a single character that is a 「word character」 (letters, digits, and underscores) «\w+» 
     Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+» 
    Match the characters 「="」 literally «="» 
Assert that it is impossible to match the regex below starting at this position (negative lookahead) «(?!display:none)» 
    Match the characters 「display:none」 literally «display:none» 
Match the regular expression below and capture its match into backreference with name 「mt」 «(?<mt>[^"<>]+)» 
    Match a single character NOT present in the list 「"<>」 «[^"<>]+» 
     Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+» 
Assert that the regex below can be matched, starting at this position (positive lookahead) «(?=")» 
    Match the character 「"」 literally «"»

圖案[ PCRE]

<!-- 
(<\w+ [^<>]*?\w+=")(?!display:none)([^"<>]+)(?=") 

Options:^and $ match at line breaks 

Match the regular expression below and capture its match into backreference number 1 «(<\w+ [^<>]*?\w+=")» 
    Match the character 「<」 literally «<» 
    Match a single character that is a 「word character」 (letters, digits, and underscores) «\w+» 
     Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+» 
    Match the character 「 」 literally « » 
    Match a single character NOT present in the list 「<>」 «[^<>]*?» 
     Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?» 
    Match a single character that is a 「word character」 (letters, digits, and underscores) «\w+» 
     Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+» 
    Match the characters 「="」 literally «="» 
Assert that it is impossible to match the regex below starting at this position (negative lookahead) «(?!display:none)» 
    Match the characters 「display:none」 literally «display:none» 
Match the regular expression below and capture its match into backreference number 2 «([^"<>]+)» 
    Match a single character NOT present in the list 「"<>」 «[^"<>]+» 
     Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+» 
Assert that the regex below can be matched, starting at this position (positive lookahead) «(?=")» 
    Match the character 「"」 literally «"»

來源

2012-04-18 05:44:30 Cylian

input = Regex.Replace(input, @"<div style=""display:none"">(.|\n)*?</div>", string.Empty, RegexOptions.Singleline);

這裏輸入包含HTML的字符串。試試這個正則表達式，它將工作InshAllah !!!!

來源

2012-12-06 07:07:17

它是CSharp版本比正則表達式解析速度快8倍。您可以輕鬆轉換爲任何您想要的語言。

public static string StripTagsCharArray(string source) 
{ 
char[] array = new char[source.Length]; 
int arrayIndex = 0; 
bool inside = false; 

for (int i = 0; i < source.Length; i++) 
{ 
    char let = source[i]; 
    if (let == '<') 
    { 
    inside = true; 
    continue; 
    } 
    if (let == '>') 
    { 
    inside = false; 
    continue; 
    } 
    if (!inside) 
    { 
    array[arrayIndex] = let; 
    arrayIndex++; 
    } 
} 
return new string(array, 0, arrayIndex); 
}

來源

2013-08-13 14:04:01 AuthorProxy

正則表達式來刪除HTML標籤

回答

相關問題