2009-04-29 34 views
0

我是使用.NET的WebRequest作爲臨時黑客「屏幕抓取」自己的頁面。.NET WebRequest/WebResponse可以正確轉換重音標記,變音標記和實體嗎?

這很好,但重音字符和變音字符不能正確翻譯。

我想知道是否有一種方法可以使用.NET的許多內置屬性和方法正確轉換它們。

這裏是我用搶的頁面代碼:

private string getArticle(string urlToGet) 
{ 

    StreamReader oSR = null; 

    //Here's the work horse of what we're doing, the WebRequest object 
    //fetches the URL 
    WebRequest objRequest = WebRequest.Create(urlToGet); 

    //The WebResponse object gets the Request's response (the HTML) 
    WebResponse objResponse = objRequest.GetResponse(); 

    //Now dump the contents of our HTML in the Response object to a 
    //Stream reader 
    oSR = new StreamReader(objResponse.GetResponseStream()); 


    //And dump the StreamReader into a string... 
    string strContent = oSR.ReadToEnd(); 

    //Here we set up our Regular expression to snatch what's between the 
    //BEGIN and END 
    Regex regex = new Regex("<!-- content_starts_here //-->((.|\n)*?)<!-- content_ends_here //-->", 
     RegexOptions.IgnoreCase); 

    //Here we apply our regular expression to our string using the 
    //Match object. 
    Match oM = regex.Match(strContent); 

    //Bam! We return the value from our Match, and we're in business. 
    return oM.Value; 
} 
+1

對於與問題完全無關的事情發表評論感到抱歉,但是您使用太多評論。認真。 – Chris 2009-04-29 23:29:31

回答

2

嘗試使用:

System.Net.WebClient客戶端=新System.Net.WebClient();
string html = client.DownloadString(urlToGet);
string decoded = System.Web.HttpUtility.HtmlDecode(html);

此外,檢查出client.Encoding

0

還有另外一種方式來處理,使用StreamReader的構造函數的第二個參數,就像這樣:

new StreamReader(webRequest.GetResponse().GetResponseStream(), 
       Encoding.GetEncoding("ISO-8859-1")); 

這將使。