0
我有這樣一段代碼來加載和使用HtmlAgilityPack
解析網頁。它適用於大多數網頁,但我試圖加載日文網頁,似乎編碼是錯誤的。我怎樣才能做到這一點?其實我該如何設置基於網頁編碼的編碼?加載日本網頁與HtmlAgilityPack
class Program {
private const string HttpMethod = "GET";
private const string UserAgent =
"Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/534.7 (KHTML, like Gecko) Chrome/7.0.517.41 Safari/534.7";
static void Main(string[] args) {
var request = WebRequest.Create("http://infoseek.co.jp/") as HttpWebRequest;
if (request == null)
return;
request.Method = HttpMethod;
request.UserAgent = UserAgent;
var response = request.GetResponse() as HttpWebResponse;
if (response == null)
return;
var stream = response.GetResponseStream();
var document = new HtmlDocument {
OptionCheckSyntax = true,
OptionFixNestedTags = true,
OptionAutoCloseOnEnd = true,
OptionDefaultStreamEncoding = Encoding.UTF8,
OptionReadEncoding = true
};
document.Load(stream, Encoding.UTF8);
var d = document.DocumentNode;
}
}
是的,我知道。但問題是這個標籤可以在文檔加載後訪問,但是在加載文檔時需要編碼。請問你有什麼想法嗎? –
看看在這個線程http://htmlagilitypack.codeplex.com/discussions/60174最後的答案。它使用System.Net.WebClient以字符串形式檢索頁面,然後傳遞該字符串以創建HtmlDocument – devio