Android Java UTF-8 HttpClient問題

我有一個奇怪的字符編碼問題與從網頁抓取JSON數組。服務器正在發回此標頭：Android Java UTF-8 HttpClient問題

Content-Type text/javascript;字符集= UTF-8

另外，我可以看看Firefox中的JSON輸出或任何瀏覽器和Unicode字符正確顯示。響應有時會包含來自另一種帶重音符號等語言的單詞。然而，當我把它拉下來並把它放到Java中的一個字符串時，我正在得到那些奇怪的問號。這裏是我的代碼：

HttpParams params = new BasicHttpParams(); 
HttpProtocolParams.setVersion(params, HttpVersion.HTTP_1_1); 
HttpProtocolParams.setContentCharset(params, "utf-8"); 
params.setBooleanParameter("http.protocol.expect-continue", false); 

HttpClient httpclient = new DefaultHttpClient(params); 

HttpGet httpget = new HttpGet("http://www.example.com/json_array.php"); 
HttpResponse response; 
    try { 
     response = httpclient.execute(httpget); 

     if(response.getStatusLine().getStatusCode() == 200){ 
      // Connection was established. Get the content. 

      HttpEntity entity = response.getEntity(); 
      // If the response does not enclose an entity, there is no need 
      // to worry about connection release 

      if (entity != null) { 
       // A Simple JSON Response Read 
       InputStream instream = entity.getContent(); 
       String jsonText = convertStreamToString(instream); 

       Toast.makeText(getApplicationContext(), "Response: "+jsonText, Toast.LENGTH_LONG).show(); 

      } 

     } 


    } catch (MalformedURLException e) { 
     Toast.makeText(getApplicationContext(), "ERROR: Malformed URL - "+e.getMessage(), Toast.LENGTH_LONG).show(); 
     e.printStackTrace(); 
    } catch (IOException e) { 
     Toast.makeText(getApplicationContext(), "ERROR: IO Exception - "+e.getMessage(), Toast.LENGTH_LONG).show(); 
     e.printStackTrace(); 
    } catch (JSONException e) { 
     Toast.makeText(getApplicationContext(), "ERROR: JSON - "+e.getMessage(), Toast.LENGTH_LONG).show(); 
     e.printStackTrace(); 
    } 

private static String convertStreamToString(InputStream is) { 
    /* 
    * To convert the InputStream to String we use the BufferedReader.readLine() 
    * method. We iterate until the BufferedReader return null which means 
    * there's no more data to read. Each line will appended to a StringBuilder 
    * and returned as String. 
    */ 
    BufferedReader reader; 
    try { 
     reader = new BufferedReader(new InputStreamReader(is, "UTF-8")); 
    } catch (UnsupportedEncodingException e1) { 
     // TODO Auto-generated catch block 
     e1.printStackTrace(); 
    } 
    StringBuilder sb = new StringBuilder(); 

    String line; 
    try { 
     while ((line = reader.readLine()) != null) { 
      sb.append(line + "\n"); 
     } 
    } catch (IOException e) { 
     e.printStackTrace(); 
    } finally { 
     try { 
      is.close(); 
     } catch (IOException e) { 
      e.printStackTrace(); 
     } 
    } 
    return sb.toString(); 
}

正如你所看到的，我在InputStreamReader中指定UTF-8，但每次我通過吐司查看返回的JSON文本時有奇怪的問號。我想我需要發送InputStream到一個字節[]，而不是？

在此先感謝您的幫助。

來源

2010-12-18 Michael Taggart

試試這個：

if (entity != null) { 
    // A Simple JSON Response Read 
    // InputStream instream = entity.getContent(); 
    // String jsonText = convertStreamToString(instream); 

    String jsonText = EntityUtils.toString(entity, HTTP.UTF_8); 

    // ... toast code here 
}

來源

2010-12-18 22:32:28

感謝您的答覆。我添加了您的更改併爲EntityUtils導入了額外的Apache東西，但現在應用程序意外終止於EntityUtils.toString行。程序編譯並運行，但是我需要在調用toString之前對實體做些什麼？ – 2010-12-18 22:42:12

沒關係。我是一個白癡，並與我的網址搞砸了。有用！角色被正確渲染！ – 2010-12-18 22:47:47

@Michael：這個答案非常好，如果我問了這個問題，我會接受這個答案。 – SK9 2012-03-24 20:25:54

@ Arhimed的回答是解決方案。但我看不到任何明顯的錯誤，你的代碼convertStreamToString。

我的猜測是：

服務器在流開始把一個UTF字節順序標記（BOM）。標準的Java UTF-8字符解碼器不會刪除物料清單，因此它有可能會在結果字符串中結束。（然而，EntityUtils的代碼似乎也沒有對BOM做任何事情。）
您的convertStreamToString正在逐行讀取字符流，然後使用硬連線'\n'作爲結束位置來重新組裝它，線標記。如果您要將其寫入外部文件或應用程序，則應該使用平臺特定的行尾標記。

來源

2010-12-19 00:17:07

這只是你的convertStreamToString不遵守HttpRespnose中的編碼集。如果您查看EntityUtils.toString(entity, HTTP.UTF_8)，您會看到EntityUtils首先查找HttpResponse中是否存在編碼集，如果存在，則EntityUtils將使用該編碼。如果實體中沒有設置編碼，它將只回退到參數中傳遞的編碼（在本例中爲HTTP.UTF_8）。

所以你可以說你的HTTP.UTF_8是在參數中傳遞的，但它永遠不會被使用，因爲它是錯誤的編碼。所以這裏是用EntityUtils的幫助器方法更新你的代碼。

  HttpEntity entity = response.getEntity(); 
      String charset = getContentCharSet(entity); 
      InputStream instream = entity.getContent(); 
      String jsonText = convertStreamToString(instream,charset); 

    private static String getContentCharSet(final HttpEntity entity) throws ParseException { 
    if (entity == null) { 
     throw new IllegalArgumentException("HTTP entity may not be null"); 
    } 
    String charset = null; 
    if (entity.getContentType() != null) { 
     HeaderElement values[] = entity.getContentType().getElements(); 
     if (values.length > 0) { 
      NameValuePair param = values[0].getParameterByName("charset"); 
      if (param != null) { 
       charset = param.getValue(); 
      } 
     } 
    } 
    return TextUtils.isEmpty(charset) ? HTTP.UTF_8 : charset; 
} 



private static String convertStreamToString(InputStream is, String encoding) { 
    /* 
    * To convert the InputStream to String we use the 
    * BufferedReader.readLine() method. We iterate until the BufferedReader 
    * return null which means there's no more data to read. Each line will 
    * appended to a StringBuilder and returned as String. 
    */ 
    BufferedReader reader; 
    try { 
     reader = new BufferedReader(new InputStreamReader(is, encoding)); 
    } catch (UnsupportedEncodingException e1) { 
     // TODO Auto-generated catch block 
     e1.printStackTrace(); 
    } 
    StringBuilder sb = new StringBuilder(); 

    String line; 
    try { 
     while ((line = reader.readLine()) != null) { 
      sb.append(line + "\n"); 
     } 
    } catch (IOException e) { 
     e.printStackTrace(); 
    } finally { 
     try { 
      is.close(); 
     } catch (IOException e) { 
      e.printStackTrace(); 
     } 
    } 
    return sb.toString(); 
}

來源

2014-07-28 17:11:38

Archimed的回答是正確的。但是，可以簡單地通過在HTTP請求中提供額外的插頭來實現：

Accept-charset: utf-8

無需刪除任何東西或使用任何其他庫。

例如，

GET/HTTP/1.1 
Host: www.website.com 
Connection: close 
Accept: text/html 
Upgrade-Insecure-Requests: 1 
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.10 Safari/537.36 
DNT: 1 
Accept-Encoding: gzip, deflate, sdch 
Accept-Language: en-US,en;q=0.8 
Accept-Charset: utf-8

最有可能你的請求沒有任何Accept-Charset頭。

來源

2015-12-01 17:53:13

從響應內容類型字段中提取字符集。您可以使用下面的方法來做到這一點：

private static String extractCharsetFromContentType(String contentType) { 
    if (TextUtils.isEmpty(contentType)) return null; 

    Pattern p = Pattern.compile(".*charset=([^\\s^;^,]+)"); 
    Matcher m = p.matcher(contentType); 

    if (m.find()) { 
     try { 
      return m.group(1); 
     } catch (Exception e) { 
      return null; 
     } 
    } 

    return null; 
}

然後使用提取的字符集創建InputStreamReader：

String charsetName = extractCharsetFromContentType(connection.getContentType()); 

InputStreamReader inReader = (TextUtils.isEmpty(charsetName) ? new InputStreamReader(inputStream) : 
        new InputStreamReader(inputStream, charsetName)); 
      BufferedReader reader = new BufferedReader(inReader);

來源

2015-12-03 00:23:12

Android Java UTF-8 HttpClient問題

回答

相關問題