2010-06-09 98 views
0

我想在網站中找到一個希伯來語字符串。閱讀代碼已附上。用c#閱讀非英文html頁面

然後我嘗試使用streamReader讀取文件,但無法匹配其他語言的字符串。 我想要做什麼?

// used on each read operation 
    byte[] buf = new byte[8192]; 

    // prepare the web page we will be asking for 
    HttpWebRequest request = (HttpWebRequest) 
     WebRequest.Create("http://www.webPage.co.il"); 

    // execute the request 
    HttpWebResponse response = (HttpWebResponse) 
     request.GetResponse(); 

    // we will read data via the response stream 
    Stream resStream = response.GetResponseStream(); 

    string tempString = null; 
    int count = 0; 
    FileStream fileDump = new FileStream(@"c:\dump.txt", FileMode.Create); 
    do 
    { 
     count = resStream.Read(buf, 0, buf.Length); 
     fileDump.Write(buf, 0, buf.Length); 

    } 
    while (count > 0); // any more data to read? 

    fileDump.Close(); 

回答

0

你缺少適當的編碼器,看看WebResponse.GetResponseStream Method的細節

更新:使用希伯來語(Windows)中的編碼是1255

Encoding encode = System.Text.Encoding.GetEncoding(1255); // Hebrew (Windows) 

// Pipe the stream to a higher level stream reader with the required encoding format. 
StreamReader readStream = new StreamReader(resStream , encode); 
+0

仍然沒有... 我覺得我的問題可能與搜索到的字符串有關,我的意思是我不能匹配: str.contains(「other language code」); 對不對? 我想要做什麼? – AYBABTU 2010-06-09 18:12:47

+0

我試圖編碼搜索到的消息,但它也失敗 string messageToFind =「otherLanguage」; UTF8Encoding utf8 = new UTF8Encoding(); Byte [] encodedBytes = utf8.GetBytes(messageToFind); messageToFind = encodedBytes.ToString(); – AYBABTU 2010-06-09 18:15:51

0

解決它。

的問題就是選擇了錯誤的編碼,我選擇UTF-8,這並不總是正確的答案:)

重點線路:

Encoding encode = System.Text.Encoding.GetEncoding("windows-1255"); 
StreamReader readStream = new StreamReader(ReceiveStream, encode); 
+0

請編輯您最初的問題,並將其作爲您的解決方案添加到具有相同問題的其他人。 – Marcote 2010-06-09 19:31:01