2013-05-30 41 views
-1

我想讀在C#與下面的代碼HTTPS URL的HTML源代碼閱讀HTML源代碼:如何從HTTPS URL

WebClient webClient = new WebClient(); 
string htmlString = w.DownloadString("https://www.targetUrl.com"); 

enter image description here

這不適合工作我,因爲我得到編碼的HTML字符串。我嘗試使用HtmlAgilityPack,但沒有任何幫助。

HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument(); 
doc.LoadHtml(htmlString); 
+1

這是什麼意思'這並沒有爲我工作,我得到編碼的HTML string'? – I4V

+0

表示它不適用於HTTPS鏈接https://www.targetUrl.com –

+0

WebClient.DownloadString'不需要執行任何特殊的操作來從https地址下載。你是什​​麼意思「編碼」?你怎麼知道它的編碼?它是什麼樣子的? – Snixtor

回答

3

該URL返回一個gzip壓縮的字符串。 WebClient默認情況下不支持此功能,因此您需要改爲下面的HttpWebRequest類。答案公然敲竹槓由費羅茲看過來 - Automatically decompress gzip response via WebClient.DownloadData

class MyWebClient : WebClient 
{ 
    protected override WebRequest GetWebRequest(Uri address) 
    { 
     HttpWebRequest request = base.GetWebRequest(address) as HttpWebRequest; 
     request.AutomaticDecompression = DecompressionMethods.Deflate | DecompressionMethods.GZip; 
     return request; 
    } 
} 
+0

是的,它也適用於http://example.com網址,但不適用於https://example.com –

+0

@kavitaverma,然後用'WebClient.DownloadData'下載頁面並自行解壓縮。 – I4V

0
ServicePointManager.ServerCertificateValidationCallback = delegate { return true; }; 
WebClient webClient = new WebClient(); 
string htmlString = w.DownloadString(url);