如何使用httpClient編碼gzip來獲取頁面源代碼？

我正在使用commons-httpclient 3.1來讀取html頁面源代碼。除了內容編碼爲gzip的頁面外，它工作正常。我收到不完整的頁面源代碼。如何使用httpClient編碼gzip來獲取頁面源代碼？

對於該頁面firefox顯示內容編碼爲gzip。

下面是詳細內容

響應頭：

status code: HTTP/1.1 200 OK 
Date = Wed, 20 Jul 2011 11:29:38 GMT 
Content-Type = text/html; charset=UTF-8 
X-Powered-By = JSF/1.2 
Set-Cookie = JSESSIONID=Zqq2Tm8V74L1LJdBzB5gQzwcLQFx1khXNvcnZjNFsQtYw41J7JQH!750321853; path=/; HttpOnly 
Transfer-Encoding = chunked 
Content- length =-1

我的代碼讀取響應：

HttpClient httpclient = new HttpClient(); 
      httpclient.getParams().setParameter("http.connection.timeout", 
        new Integer(50000000)); 
      httpclient.getParams().setParameter("http.socket.timeout", 
        new Integer(50000000)); 


     // Create a method instance. 
     GetMethod method = new GetMethod(url); 



     // Provide custom retry handler is necessary 
     method.getParams().setParameter(HttpMethodParams.RETRY_HANDLER, 
       new DefaultHttpMethodRetryHandler(3, false)); 
     BufferedReader reader = null; 
      // Execute the method. 
      int statusCode = httpclient.executeMethod(method); 

      if (statusCode != HttpStatus.SC_OK) { 
       System.err.println("Method failed: " 
         + method.getStatusLine()); 
       strHtmlContent = null; 
      } else { 


       InputStream is = method.getResponseBodyAsStream(); 
       reader = new BufferedReader(new InputStreamReader(is,"ISO8859_8")); 
       String line = null; 
       StringBuffer sbResponseBody = new StringBuffer(); 
       while ((line = reader.readLine()) != null) { 
        sbResponseBody.append(line).append("\n"); 
       } 
       strHtmlContent = sbResponseBody.toString();

來源

2011-07-20 mahesh

升級到4.1的HttpClient。它應該支持無縫壓縮。

來源

2011-07-20 11:59:01 pap

感謝您的回覆。我嘗試通過使用httpclient 4.1，我沒有得到gzip格式異常。 – mahesh

好奇。您在問題中發佈的標題部分實際上並未指定gzip編碼。你確定它確實是嗎？ – pap

雖然嘗試我得到了以下回應：---------------------------------------- 迴應是gzip編碼 ---------------------------------------- Date = Fri，22 Jul 2011 07:58:44 GMT Content-Encoding = gzip Content-Length = 5856 Content-Type = text/html; charset = UTF-8 X-Powered-By = JSF/1.2 Set-Cookie = JSESSIONID = 9D2hTptKQ1PqKsMvHcYLyFTVlQ6fTNWK3VtcQcVmBHqFb9fSbvYL！750321853;路徑= /; HttpOnly 內容長度= -1 內容編碼=空致命傳輸錯誤：未使用GZIP格式 java.io.IOException：未使用GZIP格式 – mahesh

我只是發生在這個問題上，我解決如下：

URL url = new URL("http://www.megadevs.com"); 
    HttpURLConnection conn = (HttpURLConnection) url.openConnection(); 

    GZIPInputStream gzip = new GZIPInputStream(conn.getInputStream()); 
    int value = -1; 
    String page = ""; 

    while ((value = gzip.read()) != -1) { 
     char c = (char) value; 
     page += c; 
    } 
    gzip.close();

希望這有助於。

來源

2012-06-04 14:11:07 Sebastiano

如何使用httpClient編碼gzip來獲取頁面源代碼？

回答

相關問題