c＃剝離html標籤，解碼實體

1

使用正則表達式替換標記* <.*?> *和HttpUtility類解碼實體。

2012-10-29 19:41:26 Fugiczek

5

string html = @"<textarea cols=""5"">Some &lt; text</textarea>"; 
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument(); 
doc.LoadHtml(html); 

var text = doc.DocumentNode.Descendants("textarea").First().InnerText; 
var decodedText = HttpUtility.HtmlDecode(text);

來源

2012-06-10 18:45:42

+0

HtmlAgilityPack非常好 – Cyral

+0

我不知道什麼是更好的。我正在使用SkypeKit進行應用程序。但它將笑容和實體轉換爲如下格式：：D =>：D < => < 我的想法是我只是刪除標籤和解碼實體。然後用一些自己的功能來取代微笑。如果我只想使用其中的一個函數，我不喜歡使用具有上千個函數的外部函數庫。 – sczdavos

0

我附上完整代碼：

條帶化的標籤。

public static string StripTags(string source) 
{ 
    return Regex.Replace(source, "<.*?>", string.Empty); 
}

解碼實體。

public static string DecodeHtmlEntities(string text) 
{ 
    return HttpUtility.HtmlDecode(text); 
}

來源

2012-08-17 06:39:14 sczdavos

1

我想分享我創建的代碼來執行此操作。我喜歡PHP，但我的工作是在C＃中，所以我重新創建了StripTag功能。關於如何使用它

例如：

string exampleOneWithAllStripped = StripTag("<br />this is an <b>example</b>", null); 

string exampleTwoWithOnlyBoldAllowed = StripTag("<br />this is an <b>example</b>", "b"); 

string exampleThreeWithBRandBoldAllowed = StripTag("<br />this is an <b>example</b>", "b,<br>"); 

    /// <summary> 
    ///  HTML and other mark up tags stripped from a given the string ListOfAllowedTags. 
    ///  This Method is the ASP.NET Version of the PHP Strip_Tags Method. It will strip out all html and xml tags 
    ///  except for the ones explicitly allowed in the second parameter. If allowed, this method DOES NOT strip out 
    ///  attributes. 
    /// </summary> 
    /// <param name="htmlString"> 
    ///  The HTML string. 
    /// </param> 
    /// <param name="listOfAllowedTags"> 
    ///  The list of allowed tags. if null, then nothing allowed. otherwise, ex: "b,<br/>,<hr>,p,i,<u>" 
    /// </param> 
    /// <returns> 
    ///  Cleaned String 
    /// </returns> 
    /// <author>James R.</author> 
    /// <createdate>10-27-2011</createdate> 
    public static string StripTag(string htmlString, string listOfAllowedTags) 
    { 
     if (string.IsNullOrEmpty(htmlString)) 
     { 
      return htmlString; 
     } 

     // this is the reg pattern that will retrieve all tags 
     string patternThatGetsAllTags = "</?[^><]+>"; 

     // Create the Regex for all of the Allowed Tags 
     string patternForTagsThatAreAllowed = string.Empty; 
     if (!string.IsNullOrEmpty(listOfAllowedTags)) 
     { 
      // get the HTML starting tag, such as p,i,b from an example string of <p>,<i>,<b> 
      Regex remove = new Regex("[<>\\/ ]+"); 

      // now strip out /\<> and spaces 
      listOfAllowedTags = remove.Replace(listOfAllowedTags, string.Empty); 

      // split at the commas 
      string[] listOfAllowedTagsArray = listOfAllowedTags.Split(','); 

      foreach (string allowedTag in listOfAllowedTagsArray) 
      { 
       if (string.IsNullOrEmpty(allowedTag)) 
       { 
        // jump to next element of array. 
        continue; 
       } 

       string patternVersion1 = "<" + allowedTag + ">"; // <p> 
       string patternVersion2 = "<" + allowedTag + " [^><]*>$"; 

       // <img src=stuff or <hr style="width:50%;" /> 
       string patternVersion3 = "</" + allowedTag + ">"; // closing tag 

       // if it is not the first time, then add the pipe | to the end of the string 
       if (!string.IsNullOrEmpty(patternForTagsThatAreAllowed)) 
       { 
        patternForTagsThatAreAllowed += "|"; 
       } 

       patternForTagsThatAreAllowed += patternVersion1 + "|" + patternVersion2 + "|" + patternVersion3; 
      } 
     } 

     // Get all html tags included in the string 
     Regex regexHtmlTag = new Regex(patternThatGetsAllTags); 

     if (!string.IsNullOrEmpty(patternForTagsThatAreAllowed)) 
     { 
      MatchCollection allTagsThatMatched = regexHtmlTag.Matches(htmlString); 

      foreach (Match theTag in allTagsThatMatched) 
      { 
       Regex regOfAllowedTag = new Regex(patternForTagsThatAreAllowed); 
       Match matchOfTag = regOfAllowedTag.Match(theTag.Value); 

       if (!matchOfTag.Success) 
       { 
        // if not allowed replace it with nothing 
        htmlString = htmlString.Replace(theTag.Value, string.Empty); 
       } 
      } 
     } 
     else 
     { 
      // else strip out all tags 
      htmlString = regexHtmlTag.Replace(htmlString, string.Empty); 
     } 

     return htmlString; 
    }

來源

2015-03-25 16:47:39

c＃剝離html標籤，解碼實體

回答

相關問題