從html字符串拆分段並刪除空的段落

我有一個html字符串。我想將所有段落分割成數組列表。但是分割的段落應該不是空的。被拆分的段落應該包含一些正常的文本，如果它只包含html文本，並且它內部沒有正常的文本，如：<htmltag>     </htmltag>，那麼它應該銷燬或不拆分。從html字符串拆分段並刪除空的段落

這是怎樣一段HTML字符串中拆分例如：

System.Text.RegularExpressions.Match m = System.Text.RegularExpressions.Regex.Match(htmlString, @"<p>\s*(.+?)\s*</p>"); 
ArrayList groupCollection = new ArrayList(); 
while (m.Success) 
{ 
    groupCollection.Add(m.Value); 
    m = m.NextMatch(); 
} 
ArrayList paragraphs = new ArrayList(); 
if (groupCollection.Count > 0) 
{ 
    foreach (object item in groupCollection) 
    { 
     paragraphs.Add(item); 
    } 
}

上面的代碼可以分割的所有段落，但它不能識別哪些段落是空的就像我上面說的。

來源

2013-03-19 Tri Nguyen Dung

你試過了什麼？ – 2013-03-19 04:09:57

我已經嘗試RegularExpressions從html字符串拆分所有段落。但後來我不確定它是空的。 – 2013-03-19 04:11:01

你可以發佈你的代碼與問題..？ – 2013-03-19 04:11:46

我有我自己的問題的答案。這是我自己編輯的代碼：

System.Text.RegularExpressions.Match m = System.Text.RegularExpressions.Regex.Match(htmlString, @"<p>\s*(.+?)\s*</p>"); 
    ArrayList groupCollection = new ArrayList(); 
    while (m.Success) 
    { 
     groupCollection.Add(m.Value); 
     m = m.NextMatch(); 
    } 
    ArrayList paragraphs = new ArrayList(); 
    if (groupCollection.Count > 0) 
    { 
     foreach (object item in groupCollection) 
     { 
      try 
      { 
       System.Text.RegularExpressions.Regex rx = new System.Text.RegularExpressions.Regex("<[^>]*>"); 
       // replace all matches with empty string 
       string str = rx.Replace(item.ToString(), ""); 
       string str1 = str.Replace("&nbsp;", ""); 
       if (!String.IsNullOrEmpty(str1)) 
       { 
        paragraphs.Add(item.ToString()); 
       } 
      } 
      catch 
      { 
       //This try-catch just prevent future error. 
      } 
     } 
    }

關於上面的代碼。您可以看到，我先刪除段落中的所有html標記，然後替換html字符串中的所有空白。這將幫助我確定一個空白段落。

來源

2013-03-19 04:57:42

從html字符串拆分段並刪除空的段落

回答

相關問題