獲取網站上的所有RSS鏈接

我目前正在編寫一個非常基本的程序，首先會通過網站的html代碼來查找所有RSS鏈接，然後將RSS鏈接放入數組並解析每個內容鏈接到現有的XML文件中。獲取網站上的所有RSS鏈接

但是，我還在學習C＃，並且我還沒有熟悉所有的類。我通過用get_file_contents（）編寫自己的類來完成所有這些工作，並且使用cURL來完成這項工作。我也設法用Java來繞過它。無論如何，我試圖通過使用C＃來實現相同的結果，但我認爲我在這裏做錯了什麼。

TLDR;編寫正則表達式以查找網站上所有RSS鏈接的最佳方法是什麼？

到目前爲止，我的代碼看起來是這樣的：

 private List<string> getRSSLinks(string websiteUrl) 
    { 
     List<string> links = new List<string>(); 
     MatchCollection collection = Regex.Matches(websiteUrl, @"(<link.*?>.*?</link>)", RegexOptions.Singleline); 

     foreach (Match singleMatch in collection) 
     { 
      string text = singleMatch.Groups[1].Value; 
      Match matchRSSLink = Regex.Match(text, @"type=\""(application/rss+xml)\""", RegexOptions.Singleline); 
      if (matchRSSLink.Success) 
      { 
       links.Add(text); 
      } 
     } 

     return links; 
    }

來源

2012-05-27 Nikkster

不要使用正則表達式來解析HTML。使用HTML解析器，而不是見this link的解釋

我喜歡HtmlAgilityPack解析HTMLS

using (var client = new WebClient()) 
{ 
    HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument(); 
    doc.LoadHtml(client.DownloadString("http://www.xul.fr/en-xml-rss.html")); 

    var rssLinks = doc.DocumentNode.Descendants("link") 
     .Where(n => n.Attributes["type"] != null && n.Attributes["type"].Value == "application/rss+xml") 
     .Select(n => n.Attributes["href"].Value) 
     .ToArray(); 
}

來源

2012-05-27 17:00:03

非常感謝！現在我已經完成了我想要的，謝謝你..祝你有美好的一天！ – Nikkster

獲取網站上的所有RSS鏈接

回答

相關問題