用c＃閱讀非英文html頁面

我想在網站中找到一個希伯來語字符串。閱讀代碼已附上。用c＃閱讀非英文html頁面

然後我嘗試使用streamReader讀取文件，但無法匹配其他語言的字符串。我想要做什麼？

// used on each read operation 
    byte[] buf = new byte[8192]; 

    // prepare the web page we will be asking for 
    HttpWebRequest request = (HttpWebRequest) 
     WebRequest.Create("http://www.webPage.co.il"); 

    // execute the request 
    HttpWebResponse response = (HttpWebResponse) 
     request.GetResponse(); 

    // we will read data via the response stream 
    Stream resStream = response.GetResponseStream(); 

    string tempString = null; 
    int count = 0; 
    FileStream fileDump = new FileStream(@"c:\dump.txt", FileMode.Create); 
    do 
    { 
     count = resStream.Read(buf, 0, buf.Length); 
     fileDump.Write(buf, 0, buf.Length); 

    } 
    while (count > 0); // any more data to read? 

    fileDump.Close();

來源

2010-06-09 AYBABTU

你缺少適當的編碼器，看看WebResponse.GetResponseStream Method的細節

更新：使用希伯來語（Windows）中的編碼是1255

Encoding encode = System.Text.Encoding.GetEncoding(1255); // Hebrew (Windows) 

// Pipe the stream to a higher level stream reader with the required encoding format. 
StreamReader readStream = new StreamReader(resStream , encode);

來源

2010-06-09 18:04:29 volody

仍然沒有... 我覺得我的問題可能與搜索到的字符串有關，我的意思是我不能匹配： str.contains（「other language code」）; 對不對？我想要做什麼？ – AYBABTU 2010-06-09 18:12:47

我試圖編碼搜索到的消息，但它也失敗 string messageToFind =「otherLanguage」; UTF8Encoding utf8 = new UTF8Encoding（）; Byte [] encodedBytes = utf8.GetBytes（messageToFind）; messageToFind = encodedBytes.ToString（）; – AYBABTU 2010-06-09 18:15:51

解決它。

的問題就是選擇了錯誤的編碼，我選擇UTF-8，這並不總是正確的答案:)

重點線路：

Encoding encode = System.Text.Encoding.GetEncoding("windows-1255"); 
StreamReader readStream = new StreamReader(ReceiveStream, encode);

來源

2010-06-09 19:28:03 AYBABTU

請編輯您最初的問題，並將其作爲您的解決方案添加到具有相同問題的其他人。 – Marcote 2010-06-09 19:31:01

用c＃閱讀非英文html頁面

回答

相關問題