如何從Asp.net的HTML源代碼中刪除缺少的標籤

有時候我們看到，我們從某個網站收到的HTML源代碼沒有正確的標籤結尾，這會影響我們的用戶界面。所以，就像如何從Asp.net的HTML源代碼中刪除缺少的標籤

<br /><p>hello the para start here </p> <p>some text and no ending tag

而且沒有結束標籤。

我想保留HTML格式，並希望這個喜歡

<br /><p>hello the para start here </p> some text and no ending tag

一件事是，有時我們得到的也應當由算法解決了開頭，結尾標籤。

來源

2012-07-10 Abdur Rahman

是否HomeWork類似製作一個HTML編譯器... – Usman 2012-07-10 08:10:09

我正在研究一篇文章翻譯並希望保留文章的原始格式，這是針對GabbleOn.com的 – 2012-07-10 08:11:42

請參閱http://tidy.sourceforge.net/或http://sourceforge.net/projects/tidynet/ - 它將有效地嘗試將HTML轉換爲符合（X）的HTML。或者參見http://stackoverflow.com/questions/787932/using-c-sharp-regular-expressions-to-remove-html-tags?rq=1或http://stackoverflow.com/questions/846994/how- to-use-html-agility-pack - 堆棧溢出有足夠的資源。 – dash 2012-07-10 08:12:02

嗨我想了很久，最後我有我的問題的代碼，我在這裏張貼這樣其他可以從這個好處....

public static string RemoveIncompleteTags(string source, string tag) 
    { 
     source = source.Replace(" ", " "); 
     source = source.Replace("/n", string.Empty).Replace("/r", string.Empty).Replace("/t", string.Empty); 
     source = source.Replace("<" + tag + "></" + tag + ">", string.Empty); 
     source = source.Replace("<" + tag + "> </" + tag + ">", string.Empty); 
     source = source.Replace("<" + tag + "> </" + tag + ">", string.Empty); 
     Dictionary<int, string> oDict = new Dictionary<int, string>(); 
     string[] souceList; 
     Dictionary<int, string> final = new Dictionary<int, string>(); 
     bool opening = false; 
     bool operate = false; 
     source = source.Replace(" ", " "); 
     source = source.Replace(">", "> ").Replace("<", " <"); 
     source = source.Replace(" >", ">").Replace("< ", "<"); 
     source = source.Replace(" ", " ").Replace(" ", " "); 
     souceList = source.Split(' '); 
     for (int i = 0; i < souceList.Length; i++) 
     { 
      string word = souceList[i]; 
      if (word.ToLower() == "<" + tag.ToLower() + ">") 
      { 
       opening = true; 
       operate = true; 
      } 
      else if (word.ToLower() == "</" + tag.ToLower() + ">") 
      { 
       opening = false; 
       operate = true; 
      } 
      if (operate) 
      { 
       if (opening) 
       { 
        oDict.Add(i, word); 
        final.Add(i, word); 
       } 
       else 
       { 
        if (oDict.Count != 0) 
        { 
         oDict.Remove(oDict.Last().Key);//.ToList().RemoveAt(oDict.Count - 1); 
         final.Add(i, word); 
        } 
        else 
        { 
         // need not to add to the output string 
         // code if you want to log 
        } 
       } 
       operate = false; 
       opening = false; 
      } 
      else 
      { 
       final.Add(i, word); 
      } 
     } 
     if (final.Count > 0) 
     { 
      if (oDict.Count > 0) 
      { 
       foreach (var key in oDict.Keys) 
       { 
        final.Remove(key); 
       } 
      } 
      StringBuilder fText = new StringBuilder(); 
      final.ToList().ForEach(wd => 
       { 
        if (wd.Value.Trim().Length > 0) 
         fText.Append(wd.Value.Trim() + " "); 
       }); 
      return fText.ToString().Trim(); 
     } 
     else 
     { 
      return string.Empty; 
     } 
    }

感謝...

來源

2012-07-12 06:17:02

這是一項非常勇敢的工作，但是嘗試使用正則表達式或字符串模式處理（可能格式錯誤）的HTML會導致一個痛苦的世界。上面關於整潔的短褲建議似乎是一條更好的道路。 – 2012-07-12 21:05:58

如何從Asp.net的HTML源代碼中刪除缺少的標籤

回答

相關問題