HTML查找和替換HREF標記

可能重複：
What is the best way to parse html in C#?HTML查找和替換HREF標記

我解析HTML文件。我需要在HTML中找到所有的href標籤，並用文本友好的版本替換它們。

這裏是一個例子。

Original Text: <a href="http://foo.bar">click here</a> 
replacement value: click here <http://foo.bar>

我該如何做到這一點？

來源

2012-10-29 NewUnhandledException

提示正則表達式火焰戰爭。 – JDB

與正則表達式和反向引用 – entonio

@ Cyborgx37他沒有要求'正則表達式'..問題是**有效** – Anirudha

你可以使用Html Agility Pack library，用這樣的代碼：

 HtmlDocument doc = new HtmlDocument(); 
     doc.Load(myHtmlFile); // load your file 

     // select recursively all A elements declaring an HREF attribute. 
     foreach (HtmlNode node in doc.DocumentNode.SelectNodes("//a[@href]")) 
     { 
      node.ParentNode.ReplaceChild(doc.CreateTextNode(node.InnerText + " <" + node.GetAttributeValue("href", null) + ">"), node); 
     } 

     doc.Save(Console.Out); // output the new doc.

來源

2012-10-29 17:06:28

只是在這裏注意到（根據http://meta.stackexchange.com/questions/156184的要求）Simon推薦的圖書館是他的作者之一。目前最顯着的競爭對手是[CsQuery]（https://github.com/jamietre/CsQuery）。 –

HTML查找和替換HREF標記

回答

相關問題