閱讀在線網絡的源HTML的奇怪行爲

我有一個問題，我猜是因爲我用來讀取網頁的對象，在這種情況下，這兩個對象都是Retrofit2和HttpURLConnection。閱讀在線網絡的源HTML的奇怪行爲

情況是：我需要閱讀一個沒有API（不是我的）的網頁，並提取整個頁面的HTML，但我遇到了兩個我嘗試過的工具（前面提到）格式。

網絡本身具有此meta標籤：

<meta http-equiv="content-type" content="text/html; charset=ISO-8859-1">

因此它顯示的與他們的話（這是在西班牙）的重音符號。你可以清楚地在網絡上看到重音符號是很好的瀏覽器，Mozilla或其他任何瀏覽器解釋：

您還可以看到在HTML文件中的重音符號：

但這裏的問題時刺中我在我的背上：

伏法：

原料：

而現在，我會告訴你我試過至今。第一個電話是Retrofit2。

客戶（沒有任何轉換器，因爲我想生的（這聽起來不錯，順便說一句））：

public static Retrofit getRaiaApi() { 
    if (raiaRetrofit == null) { 
     raiaRetrofit = new Retrofit.Builder() 
       .baseUrl(RAIA_URL) 
       .build(); 
    } 
    return raiaRetrofit; 
}

POST方法：

@Headers({ 
     "Content-Type: application/x-www-form-urlencoded;charset=utf-8" 
}) 
@FormUrlEncoded 
@POST("index.php?operacion=consulta") 
Call<ResponseBody> postRaiaSearch(@Header("Cookie") String cookie, @Field("microchip") String microchip);

而且的致電：

private void nextRaiaSearch(String sessionCookie) { 
    callRaiaSearch = apiInterfaceRaia.postRaiaSearch(sessionCookie, chipInput); 
    callRaiaSearch.enqueue(new Callback<ResponseBody>() { 
     @Override 
     public void onResponse(Call<ResponseBody> call, Response<ResponseBody> response) { 
      Log.v("call", "onResponse"); 
      try { 
       String html = response.body().string(); 
      } catch (IOException e) { 
       e.printStackTrace(); 
      } 
     } 

     @Override 
     public void onFailure(Call<ResponseBody> call, Throwable t) { 
      Log.v("call", "onFailure"); 
     } 
    }); 
}

但正如我之前解釋的那樣，這給了我那些帶有錯誤的HTML。

然後，我想：「呃，也許Retrofit正在轉換一些東西，這不是真正的網絡的原始來源，所以讓我們試試其他的東西吧。

並試圖用一個簡單的HttpURLConnection。

private void nextRaiaSearch(String sessionCookie) throws IOException { 
    URL url = new URL("https://www.raia.es/index.php?operacion=consulta"); 
    HttpURLConnection connection = (HttpURLConnection) url.openConnection(); 
    OutputStreamWriter request; 
    StringBuilder response = new StringBuilder(); 

    connection.setRequestProperty("Content-Type", "application/x-www-form-urlencoded"); 
    connection.setRequestProperty("Cookie", sessionCookie); 
    connection.setRequestMethod("POST"); 
    connection.setConnectTimeout(60000); 
    connection.setReadTimeout(10000); 

    request = new OutputStreamWriter(connection.getOutputStream()); 
    request.write("microchip=" + chipInput); 
    request.flush(); 
    request.close(); 

    String line; 
    InputStreamReader input = new InputStreamReader(connection.getInputStream()); 
    BufferedReader reader = new BufferedReader(input); 
    while ((line = reader.readLine()) != null) { 
     response.append(line).append("\n"); 
    } 
    input.close(); 
    reader.close(); 

    String html = response.toString(); 
}

，但結果是完全一樣的：

我缺少的東西？我應該使用其他工具嗎？

來源

2017-10-06 JMedinilla

的問題是服務器端...發送字符在ISO-8859-1，但它並沒有在標題中提到這件事... – Selvin

是的，問題是，網絡來自當地政府，而不是我的，我什麼也沒有改變。這就是爲什麼我只想讀取源HTML，但它沒有成功，我猜。 – JMedinilla

與你的'HttpUrlConnection'代碼嘗試設置'InputStreamReader'構造函數的第二個參數來強制給定的編碼 – Selvin

您可以使用InputStreamReader指定由服務器提供的編碼。

例如：

InputStreamReader input = new InputStreamReader(connection.getInputStream(), Charset.forName("ISO-8859-1"));

我希望工程

來源

2017-10-06 10:18:21

你有沒有試過玩輸出字符串？

喜歡的東西

String html = new String(response.toString().getBytes(), "UTF-8");

來源

2017-10-06 09:52:48

我已經嘗試過使用該字符集，並且根本不寫任何字符集，但請求標題在響應上沒有任何變化。 – JMedinilla

給我的答案再試一次 –

這是我第一次嘗試。但是當你有迴應時，它已經有了錯誤，所以改變字符集將不會做任何事情。無論如何感謝您的答案。 – JMedinilla

閱讀在線網絡的源HTML的奇怪行爲

回答

相關問題