2012-04-14 24 views
5

我試圖用xpath從下文提到的網址的NullReferenceException在HtmlAgilityPack

string url = "http://www.album-cover-art.org/search.php?q=Ruin+-+Live+Album+Version+Lamb+of+God" 

我的代碼提取link

HtmlAgilityPack.HtmlWeb web = new HtmlAgilityPack.HtmlWeb(); 
HtmlAgilityPack.HtmlDocument htmlDoc = new HtmlAgilityPack.HtmlDocument(); 
htmlDoc = web.Load(url); //Exception generated here Line 23 

if (htmlDoc.DocumentNode != null) 
{ 
    HtmlNode linkNode = htmlDoc.DocumentNode.SelectSingleNode(".//*[@id='related_search_row']/img/@src"); 
    if (linkNode != null) 
     Console.WriteLine(linkNode.InnerText); 
} 

上面的代碼編譯罰款,但是當我嘗試運行它產生例外

Unhandled Exception: System.NullReferenceException: Object reference not set to an instance of an object. 

Complete stacktrace

System.NullReferenceException: Object reference not set to an instance of an object. 
    at HtmlAgilityPack.HtmlDocument.ReadDocumentEncoding(HtmlNode node) in C:\Source\htmlagilitypack\Trunk\HtmlAgilityPack\HtmlDocument.cs:line 1916 
    at HtmlAgilityPack.HtmlDocument.PushNodeEnd(Int32 index, Boolean close) in C:\Source\htmlagilitypack\Trunk\HtmlAgilityPack\HtmlDocument.cs:line 1805 
    at HtmlAgilityPack.HtmlDocument.Parse() in C:\Source\htmlagilitypack\Trunk\HtmlAgilityPack\HtmlDocument.cs:line 1468 
    at HtmlAgilityPack.HtmlDocument.Load(TextReader reader) in C:\Source\htmlagilitypack\Trunk\HtmlAgilityPack\HtmlDocument.cs:line 769 
    at HtmlAgilityPack.HtmlWeb.Get(Uri uri, String method, String path, HtmlDocument doc, IWebProxy proxy, ICredentials creds) in C:\Source\htmlagilitypack\Trunk\HtmlAgilityPack\HtmlWeb.cs:line 1515 
    at HtmlAgilityPack.HtmlWeb.LoadUrl(Uri uri, String method, WebProxy proxy, NetworkCredential creds) in C:\Source\htmlagilitypack\Trunk\HtmlAgilityPack\HtmlWeb.cs:line 1563 
    at HtmlAgilityPack.HtmlWeb.Load(String url, String method) in C:\Source\htmlagilitypack\Trunk\HtmlAgilityPack\HtmlWeb.cs:line 1149 
    at HtmlAgilityPack.HtmlWeb.Load(String url) in C:\Source\htmlagilitypack\Trunk\HtmlAgilityPack\HtmlWeb.cs:line 1107 
    at ScreenScrapping.Program.Main(String[] args) in c:\Users\ranveer\csharp\ScreenScrapping\ScreenScrapping\Program.cs:line 23 

所以,我的問題是爲什麼我得到這個異常。

+0

隨着版本'的HtmlAgilityPack你的樣品工作正常1.4.3'。你使用哪個版本? – nemesv 2012-04-14 07:57:58

+0

@nemesv:現在我正在使用HtmlAgilityPack ver1.4.3。現在我沒有得到任何錯誤,但是'Console.WriteLine(linkNode.InnerText); '沒有給出任何輸出,也沒有'linkNode'爲空我檢查了。 – RanRag 2012-04-14 09:17:22

+0

當我使用xpath'// title/text()'時,它工作正常,但是當我切換到涉及使用'/ @ href或/ @ src'屬性訪問的xpath表達式時,它不起作用。 – RanRag 2012-04-14 09:18:41

回答

6

這是HtmlAgilityPack中的一個錯誤。您嘗試解析的文檔有<meta http-equiv="Content-Type" content="text/html; charset=iso-utf-8">,其中charset值(iso-utf-8)無法通過AgilityPack解析爲有效的編碼名稱。正如Simon Mourier said,這是在1.4.0.0中引入的一個錯誤。

爲了避免這種情況,手動從流裝入原稿並手動設置的編碼這樣的:

var htmlDoc = new HtmlDocument(); 
htmlDoc.OptionReadEncoding = false; 
var request = (HttpWebRequest)WebRequest.Create(url); 
request.Method = "GET"; 
using (var response = (HttpWebResponse)request.GetResponse()) 
{ 
    using (var stream = response.GetResponseStream()) 
    { 
     htmlDoc.Load(stream, Encoding.UTF8); 
    } 
} 
+0

感謝您的回答。它運行良好。現在我使用HtmlAgilityPack ver1.4.3。現在我沒有得到任何錯誤,但是'Console.WriteLine(linkNode.InnerText); '沒有給出任何輸出,也沒有'linkNode'爲空我檢查了。 – RanRag 2012-04-14 09:19:53

+0

當我使用xpath'// title/text()'時,它工作正常,但是當我使用'/ @ href或/ @ src'切換到涉及屬性acccess的xpath表達式時,它不起作用。 – RanRag 2012-04-14 09:20:51

+0

@Noob,嘗試像使用「// a [@href]」 - 使用方括號來表示屬性 – Alex 2012-04-14 09:23:42