2013-03-22 46 views
0

我嘗試從發佈的鏈接中提取圖像和第一次檢查我做的是看是否鏈接是這樣一個簡單的圖片:WebResponse的HtmlDocument.LoadHtml?

HttpWebRequest request; 
    WebResponse webresponse; 

    request = (HttpWebRequest)HttpWebRequest.Create(url); 

    webresponse = request.GetResponse(); 
    if (webresponse.ContentType.StartsWith("image/")) 
     ... 

如果沒有找到我想要去與HTML敏捷性包,但要能做到這一點我需要運行:

HtmlDocument doc; 
reader = new StreamReader(webresponse.GetResponseStream()); 
doc.LoadHtml(reader.ReadToEnd()); 

的問題是,LoadHtml不會發現,即使我敢肯定,有在響應HTML代碼的來源。我懷疑HTML的格式不正確?

這裏是什麼樣的ReadToEnd的會產生一部分:如果該事項

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> 

    <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="sv" lang="sv"> 
    <head><title> 
     X - Eclipse - 2011 
    </title> 

     <!--[if lt IE 7]> 
     <script defer type="text/javascript" src="../javascript/pngfix.js"></script> 
     <![endif]--> 
     <!--<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />--> 

     <meta http-equiv="Content-type" content="text/html; charset=iso-8859-1" /><link href="../../../App_Themes/X/mainStyleSheet.css" type="text/css" rel="stylesheet" /><meta name="author" content="" /><meta name="copyright" content="X.net" /><meta name="description" content="Välkommen in till ett av Sveriges största Xcommunity." /><meta name="keywords" content="X, rollspel, boardgamegeek, boardgame, X.net, X.net, community, Jimmy, Nilsson, schack, risk, puerto rico" /><script language="javascript" type="text/javascript" src="/sites/X/javascript/common.js"></script><script language="javascript" type="text/javascript" src="/sites/X/javascript/ajaxHandler.js"></script><script language="javascript" type="text/javascript" src="/javascript/jquery.js"></script><link rel="shortcut icon" href="/App_Themes/X/Images/common/browserIcon/favicon.ico" /><link rel="icon" href="/App_Themes/X/Images/common/browserIcon/animated_favicon1.gif" type="image/gif" /></head> 
    <body> 
     <div id="topBack"> 
     <div id="siteContainer"> 
     <form method="post" action="game.aspx?gameId=72125" id="aspnetForm" enctype="multipart/form-data"> 

<input type="hidden" name="__VIEWSTATE" id="__VIEWSTATE" value="/wEPDw.... 

我可以看到,該字符串包含一些換行符(\ r \ n)的命令?

我的目標很簡單,即避免多次下載網頁,否則我可以使用WebClient.DownloadString(url);以我知道的格式下載它。

+0

你爲什麼不使用所提供的HtmlWeb類http://stackoverflow.com/questions/13400493/htmlagilitypack-htmlweb-load-returning-empty-document – 2013-03-22 12:00:42

+0

普萊斯解釋,這怎麼能解決我的問題?我可以用這個類檢查ContentType,然後用它來填充doc.LoadHtml嗎? – Banshee 2013-03-22 12:06:22

+0

請閱讀鏈接 – 2013-03-22 12:07:30

回答

0

這工作:

request = (HttpWebRequest)HttpWebRequest.Create(url); 
webresponse = (HttpWebResponse)request.GetResponse(); 
if (webresponse.ContentType.StartsWith("image/")) 
{...} 
if (webresponse.ContentType.StartsWith("text/html")) 
{ 
    var resultStream = webresponse.GetResponseStream(); 
    doc.Load(resultStream); 
}