2012-04-18 64 views
0

輸入:正則表達式來刪除HTML標籤

<td> 
<span> 
<span>spanaaa</span> 
<span class="1">spanbbb</span> 
<span class="" style="">spanccc</span> 
<span style="display:none">spanddd</span> 

<div>divaaa</div> 
<div class="1">divbbb</div> 
<div class="" style="">divccc</div> 
<div style="display:none">divddd</div> 
</span> 
</td> 

我需要一個正明示或爲了獲取值沒有屬性的風格=的方法「顯示:無」

輸出:

spanaaa
spanbbb
spanccc

divaaa
divbbb
divccc

+2

[正則表達式解析什麼,但瑣碎的HTML時要使用一個非常糟糕的工具。(http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self -contained-tags) – 2012-04-18 05:22:44

+0

您想要服務器端解決方案還是客戶端解決方案? – 2012-04-18 05:26:49

+0

你想怎麼做?一種方法是使用[xpath](http://www.w3schools.com/xpath/) – 2012-04-18 05:37:53

回答

0

正則表達式是這一個糟糕的選擇(因爲HTML變幻莫測的),但是你可以試試這個:

<div(?!\s*style="display:none")[^>]*>(.*?)</div> 
+0

「(?!=」是否爲錯字?)中的「=」因爲它不是預測本身的一部分(即「 (?='或'(?!'),但它永遠不會與該位置上的文字'='匹配 – 2012-04-19 21:10:54

+0

@AlanMoore我的不好!謝謝你選擇了這個,我在我的答案中糾正了正則表達式。 – Bohemian 2012-04-19 23:12:36

1

模式[.NET味]

(?<=<\w+ [^<>]*?\w+=")(?!display:none)(?<mt>[^"<>]+)(?=") 

Options:^and $ match at line breaks 

Assert that the regex below can be matched, with the match ending at this position (positive lookbehind) «(?<=<\w+ [^<>]*?\w+=")» 
    Match the character 「<」 literally «<» 
    Match a single character that is a 「word character」 (letters, digits, and underscores) «\w+» 
     Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+» 
    Match the character 「 」 literally « » 
    Match a single character NOT present in the list 「<>」 «[^<>]*?» 
     Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?» 
    Match a single character that is a 「word character」 (letters, digits, and underscores) «\w+» 
     Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+» 
    Match the characters 「="」 literally «="» 
Assert that it is impossible to match the regex below starting at this position (negative lookahead) «(?!display:none)» 
    Match the characters 「display:none」 literally «display:none» 
Match the regular expression below and capture its match into backreference with name 「mt」 «(?<mt>[^"<>]+)» 
    Match a single character NOT present in the list 「"<>」 «[^"<>]+» 
     Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+» 
Assert that the regex below can be matched, starting at this position (positive lookahead) «(?=")» 
    Match the character 「"」 literally «"» 

圖案[ PCRE]

<!-- 
(<\w+ [^<>]*?\w+=")(?!display:none)([^"<>]+)(?=") 

Options:^and $ match at line breaks 

Match the regular expression below and capture its match into backreference number 1 «(<\w+ [^<>]*?\w+=")» 
    Match the character 「<」 literally «<» 
    Match a single character that is a 「word character」 (letters, digits, and underscores) «\w+» 
     Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+» 
    Match the character 「 」 literally « » 
    Match a single character NOT present in the list 「<>」 «[^<>]*?» 
     Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?» 
    Match a single character that is a 「word character」 (letters, digits, and underscores) «\w+» 
     Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+» 
    Match the characters 「="」 literally «="» 
Assert that it is impossible to match the regex below starting at this position (negative lookahead) «(?!display:none)» 
    Match the characters 「display:none」 literally «display:none» 
Match the regular expression below and capture its match into backreference number 2 «([^"<>]+)» 
    Match a single character NOT present in the list 「"<>」 «[^"<>]+» 
     Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+» 
Assert that the regex below can be matched, starting at this position (positive lookahead) «(?=")» 
    Match the character 「"」 literally «"» 
0
input = Regex.Replace(input, @"<div style=""display:none"">(.|\n)*?</div>", string.Empty, RegexOptions.Singleline); 

這裏輸入包含HTML的字符串。試試這個正則表達式,它將工作InshAllah !!!!

0

它是CSharp版本比正則表達式解析速度快8倍。您可以輕鬆轉換爲任何您想要的語言。

public static string StripTagsCharArray(string source) 
{ 
char[] array = new char[source.Length]; 
int arrayIndex = 0; 
bool inside = false; 

for (int i = 0; i < source.Length; i++) 
{ 
    char let = source[i]; 
    if (let == '<') 
    { 
    inside = true; 
    continue; 
    } 
    if (let == '>') 
    { 
    inside = false; 
    continue; 
    } 
    if (!inside) 
    { 
    array[arrayIndex] = let; 
    arrayIndex++; 
    } 
} 
return new string(array, 0, arrayIndex); 
}