我試圖讀取一個rss新聞提要,並重寫文章的日期,標題和正文在txt文件上的程序。我前兩天剛學過C#,但有其他語言的經驗。 該程序適用於某些Feed,但在其他人(例如路透社)中,在每篇文章正文後面有一個「通過電子郵件發送此文章」類型的鏈接,並且在複製它時似乎無法擺脫它。我運行整個飼料的程序。StreamWrite xml節點內容忽略使用C#的兒童
例如,這是一些新聞的XML代碼:
<item>
<title>Pimco's Ivascyn sees 'significant' opportunity in European bank assets</title>
<link>http://feeds.reuters.com/~r/news/wealth/~3/vUJ74S5mXQg/story01.htm</link>
<category domain="">PersonalFinance</category>
<pubDate>Mon, 16 Jun 2014 15:37:52 GMT</pubDate>
<guid isPermaLink="false">http://www.reuters.com/article/2014/06/16/us-investing-pimco-ivascyn-idUSKBN0ER1VV20140616?feedType=RSS&feedName=PersonalFinance</guid>
<description>NEW YORK (Reuters) - The expected unloading of roughly $1 trillion in assets by European banks represents a "significant investment opportunity" in residential and commercial real estate as well as...<div class="feedflare">
<a href="http://feeds.reuters.com/~ff/news/wealth?a=vUJ74S5mXQg:y6BPXasLV5o:yIl2AUoC8zA"><img src="http://feeds.feedburner.com/~ff/news/wealth?d=yIl2AUoC8zA" border="0"></img></a>
</div><img src="http://feeds.feedburner.com/~r/news/wealth/~4/vUJ74S5mXQg" height="1" width="1"/></description
<feedburner:origLink>http://reuters.us.feedsportal.com/c/35217/f/654211/s/3b8e7c6b/sc/2/l/0L0Sreuters0N0Carticle0C20A140C0A60C160Cus0Einvesting0Epimco0Eivascyn0EidUSKBN0AER1VV20A140A6160DfeedType0FRSS0GfeedName0FPersonalFinance/story01.htm</feedburner:origLink>
</item>
然而,當我運行程序我得到:
Mon, 16 Jun 2014 15:37:52 GMT
Pimco's Ivascyn sees 'significant' opportunity in European bank assets
NEW YORK (Reuters) - The expected unloading of roughly $1 trillion in assets by European banks represents a "significant investment opportunity" in residential and commercial real estate as well as...<div class="feedflare">
<a href="http://feeds.reuters.com/~ff/news/wealth a=vUJ74S5mXQg:y6BPXasLV5o:yIl2AUoC8zA"><img src="http://feeds.feedburner.com/~ff/news/wealth?d=yIl2AUoC8zA" border="0"></img></a>
</div><img src="http://feeds.feedburner.com/~r/news/wealth/~4/vUJ74S5mXQg" height="1" width="1"/>
**********
我試圖擺脫最後兩行的文章正文之後的代碼。我添加了星號來分隔不同的文章。
這裏是我的代碼:
using System;
using System.IO;
using System.Text;
using System.Xml;
namespace XmlReading
{
class RssReading
{
static void Main(string[] args)
{
//Creater a StreamWriter object to write in a text file.
StreamWriter sw = new StreamWriter("C:\\Users\Testing002.txt");
XmlDocument xmlDoc = new XmlDocument();
// Loads the rss feed page
xmlDoc.Load("http://feeds.reuters.com/news/wealth");
//create an object of item nodes.
XmlNodeList itemNodes = xmlDoc.SelectNodes("//rss/channel/item");
foreach (XmlNode itemNode in itemNodes)
{
//Reading the title
XmlNode titleNode = itemNode.SelectSingleNode("title");
//Reading the date
XmlNode dateNode = itemNode.SelectSingleNode("pubDate");
//Reading the body
XmlNode bodyNode = itemNode.SelectSingleNode("description");
if(((titleNode != null) && (dateNode != null)) && (bodyNode!= null))
{
/* Xpath of article body, and of extra links.
* //*[@id="bodyblock"]/ul/li[2]/div/text()
* //*[@id="bodyblock"]/ul/li[2]/div/div
*/
//writing to console just to check the output.
Console.WriteLine(dateNode.InnerText);
sw.WriteLine(dateNode.InnerText);
Console.WriteLine(titleNode.InnerText);
sw.WriteLine(titleNode.InnerText);
Console.WriteLine(bodyNode.Value);
sw.WriteLine(bodyNode.InnerText);
Console.WriteLine("**********\n\n\n");
sw.WriteLine("**********\n\n\n");
sw.WriteLine(" ");
sw.WriteLine(" ");
}
}
sw.Close();
Console.ReadKey(true);
}
}
}
預先感謝任何幫助或建議。
您的「XML代碼」不是RSS提要的XML結構。它是它的HTML表示。請提供您正在嘗試處理的XML結構。 –
對不起,我的壞。我現在糾正了它。 – user3748452