2010-12-18 47 views
15

我有一個奇怪的字符編碼問題與從網頁抓取JSON數組。服務器正在發回此標頭:Android Java UTF-8 HttpClient問題

Content-Type text/javascript;字符集= UTF-8

另外,我可以看看Firefox中的JSON輸出或任何瀏覽器和Unicode字符正確顯示。響應有時會包含來自另一種帶重音符號等語言的單詞。然而,當我把它拉下來並把它放到Java中的一個字符串時,我正在得到那些奇怪的問號。這裏是我的代碼:

HttpParams params = new BasicHttpParams(); 
HttpProtocolParams.setVersion(params, HttpVersion.HTTP_1_1); 
HttpProtocolParams.setContentCharset(params, "utf-8"); 
params.setBooleanParameter("http.protocol.expect-continue", false); 

HttpClient httpclient = new DefaultHttpClient(params); 

HttpGet httpget = new HttpGet("http://www.example.com/json_array.php"); 
HttpResponse response; 
    try { 
     response = httpclient.execute(httpget); 

     if(response.getStatusLine().getStatusCode() == 200){ 
      // Connection was established. Get the content. 

      HttpEntity entity = response.getEntity(); 
      // If the response does not enclose an entity, there is no need 
      // to worry about connection release 

      if (entity != null) { 
       // A Simple JSON Response Read 
       InputStream instream = entity.getContent(); 
       String jsonText = convertStreamToString(instream); 

       Toast.makeText(getApplicationContext(), "Response: "+jsonText, Toast.LENGTH_LONG).show(); 

      } 

     } 


    } catch (MalformedURLException e) { 
     Toast.makeText(getApplicationContext(), "ERROR: Malformed URL - "+e.getMessage(), Toast.LENGTH_LONG).show(); 
     e.printStackTrace(); 
    } catch (IOException e) { 
     Toast.makeText(getApplicationContext(), "ERROR: IO Exception - "+e.getMessage(), Toast.LENGTH_LONG).show(); 
     e.printStackTrace(); 
    } catch (JSONException e) { 
     Toast.makeText(getApplicationContext(), "ERROR: JSON - "+e.getMessage(), Toast.LENGTH_LONG).show(); 
     e.printStackTrace(); 
    } 

private static String convertStreamToString(InputStream is) { 
    /* 
    * To convert the InputStream to String we use the BufferedReader.readLine() 
    * method. We iterate until the BufferedReader return null which means 
    * there's no more data to read. Each line will appended to a StringBuilder 
    * and returned as String. 
    */ 
    BufferedReader reader; 
    try { 
     reader = new BufferedReader(new InputStreamReader(is, "UTF-8")); 
    } catch (UnsupportedEncodingException e1) { 
     // TODO Auto-generated catch block 
     e1.printStackTrace(); 
    } 
    StringBuilder sb = new StringBuilder(); 

    String line; 
    try { 
     while ((line = reader.readLine()) != null) { 
      sb.append(line + "\n"); 
     } 
    } catch (IOException e) { 
     e.printStackTrace(); 
    } finally { 
     try { 
      is.close(); 
     } catch (IOException e) { 
      e.printStackTrace(); 
     } 
    } 
    return sb.toString(); 
} 

正如你所看到的,我在InputStreamReader中指定UTF-8,但每次我通過吐司查看返回的JSON文本時有奇怪的問號。我想我需要發送InputStream到一個字節[],而不是?

在此先感謝您的幫助。

回答

37

試試這個:

if (entity != null) { 
    // A Simple JSON Response Read 
    // InputStream instream = entity.getContent(); 
    // String jsonText = convertStreamToString(instream); 

    String jsonText = EntityUtils.toString(entity, HTTP.UTF_8); 

    // ... toast code here 
} 
+0

感謝您的答覆。我添加了您的更改併爲EntityUtils導入了額外的Apache東西,但現在應用程序意外終止於EntityUtils.toString行。程序編譯並運行,但是我需要在調用toString之前對實體做些什麼? – 2010-12-18 22:42:12

+0

沒關係。我是一個白癡,並與我的網址搞砸了。有用!角色被正確渲染! – 2010-12-18 22:47:47

+3

@Michael:這個答案非常好,如果我問了這個問題,我會接受這個答案。 – SK9 2012-03-24 20:25:54

5

@ Arhimed的回答是解決方案。但我看不到任何明顯的錯誤,你的代碼convertStreamToString

我的猜測是:

  1. 服務器在流開始把一個UTF字節順序標記(BOM)。標準的Java UTF-8字符解碼器不會刪除物料清單,因此它有可能會在結果字符串中結束。 (然而,EntityUtils的代碼似乎也沒有對BOM做任何事情。)
  2. 您的convertStreamToString正在逐行讀取字符流,然後使用硬連線'\n'作爲結束位置來重新組裝它,線標記。如果您要將其寫入外部文件或應用程序,則應該使用平臺特定的行尾標記。
1

這只是你的convertStreamToString不遵守HttpRespnose中的編碼集。如果您查看EntityUtils.toString(entity, HTTP.UTF_8),您會看到EntityUtils首先查找HttpResponse中是否存在編碼集,如果存在,則EntityUtils將使用該編碼。如果實體中沒有設置編碼,它將只回退到參數中傳遞的編碼(在本例中爲HTTP.UTF_8)。

所以你可以說你的HTTP.UTF_8是在參數中傳遞的,但它永遠不會被使用,因爲它是錯誤的編碼。所以這裏是用EntityUtils的幫助器方法更新你的代碼。

  HttpEntity entity = response.getEntity(); 
      String charset = getContentCharSet(entity); 
      InputStream instream = entity.getContent(); 
      String jsonText = convertStreamToString(instream,charset); 

    private static String getContentCharSet(final HttpEntity entity) throws ParseException { 
    if (entity == null) { 
     throw new IllegalArgumentException("HTTP entity may not be null"); 
    } 
    String charset = null; 
    if (entity.getContentType() != null) { 
     HeaderElement values[] = entity.getContentType().getElements(); 
     if (values.length > 0) { 
      NameValuePair param = values[0].getParameterByName("charset"); 
      if (param != null) { 
       charset = param.getValue(); 
      } 
     } 
    } 
    return TextUtils.isEmpty(charset) ? HTTP.UTF_8 : charset; 
} 



private static String convertStreamToString(InputStream is, String encoding) { 
    /* 
    * To convert the InputStream to String we use the 
    * BufferedReader.readLine() method. We iterate until the BufferedReader 
    * return null which means there's no more data to read. Each line will 
    * appended to a StringBuilder and returned as String. 
    */ 
    BufferedReader reader; 
    try { 
     reader = new BufferedReader(new InputStreamReader(is, encoding)); 
    } catch (UnsupportedEncodingException e1) { 
     // TODO Auto-generated catch block 
     e1.printStackTrace(); 
    } 
    StringBuilder sb = new StringBuilder(); 

    String line; 
    try { 
     while ((line = reader.readLine()) != null) { 
      sb.append(line + "\n"); 
     } 
    } catch (IOException e) { 
     e.printStackTrace(); 
    } finally { 
     try { 
      is.close(); 
     } catch (IOException e) { 
      e.printStackTrace(); 
     } 
    } 
    return sb.toString(); 
} 
0

Archimed的回答是正確的。但是,可以簡單地通過在HTTP請求中提供額外的插頭來實現:

Accept-charset: utf-8 

無需刪除任何東西或使用任何其他庫。

例如,

GET/HTTP/1.1 
Host: www.website.com 
Connection: close 
Accept: text/html 
Upgrade-Insecure-Requests: 1 
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.10 Safari/537.36 
DNT: 1 
Accept-Encoding: gzip, deflate, sdch 
Accept-Language: en-US,en;q=0.8 
Accept-Charset: utf-8 

最有可能你的請求沒有任何Accept-Charset頭。

0

從響應內容類型字段中提取字符集。您可以使用下面的方法來做到這一點:

private static String extractCharsetFromContentType(String contentType) { 
    if (TextUtils.isEmpty(contentType)) return null; 

    Pattern p = Pattern.compile(".*charset=([^\\s^;^,]+)"); 
    Matcher m = p.matcher(contentType); 

    if (m.find()) { 
     try { 
      return m.group(1); 
     } catch (Exception e) { 
      return null; 
     } 
    } 

    return null; 
} 

然後使用提取的字符集創建InputStreamReader

String charsetName = extractCharsetFromContentType(connection.getContentType()); 

InputStreamReader inReader = (TextUtils.isEmpty(charsetName) ? new InputStreamReader(inputStream) : 
        new InputStreamReader(inputStream, charsetName)); 
      BufferedReader reader = new BufferedReader(inReader);