2
我正在使用commons-httpclient 3.1來讀取html頁面源代碼。除了內容編碼爲gzip的頁面外,它工作正常。我收到不完整的頁面源代碼。如何使用httpClient編碼gzip來獲取頁面源代碼?
對於該頁面firefox顯示內容編碼爲gzip。
下面是詳細內容
響應頭:
status code: HTTP/1.1 200 OK
Date = Wed, 20 Jul 2011 11:29:38 GMT
Content-Type = text/html; charset=UTF-8
X-Powered-By = JSF/1.2
Set-Cookie = JSESSIONID=Zqq2Tm8V74L1LJdBzB5gQzwcLQFx1khXNvcnZjNFsQtYw41J7JQH!750321853; path=/; HttpOnly
Transfer-Encoding = chunked
Content- length =-1
我的代碼讀取響應:
HttpClient httpclient = new HttpClient();
httpclient.getParams().setParameter("http.connection.timeout",
new Integer(50000000));
httpclient.getParams().setParameter("http.socket.timeout",
new Integer(50000000));
// Create a method instance.
GetMethod method = new GetMethod(url);
// Provide custom retry handler is necessary
method.getParams().setParameter(HttpMethodParams.RETRY_HANDLER,
new DefaultHttpMethodRetryHandler(3, false));
BufferedReader reader = null;
// Execute the method.
int statusCode = httpclient.executeMethod(method);
if (statusCode != HttpStatus.SC_OK) {
System.err.println("Method failed: "
+ method.getStatusLine());
strHtmlContent = null;
} else {
InputStream is = method.getResponseBodyAsStream();
reader = new BufferedReader(new InputStreamReader(is,"ISO8859_8"));
String line = null;
StringBuffer sbResponseBody = new StringBuffer();
while ((line = reader.readLine()) != null) {
sbResponseBody.append(line).append("\n");
}
strHtmlContent = sbResponseBody.toString();
感謝您的回覆。我嘗試通過使用httpclient 4.1,我沒有得到gzip格式異常。 – mahesh
好奇。您在問題中發佈的標題部分實際上並未指定gzip編碼。你確定它確實是嗎? – pap
雖然嘗試我得到了以下回應:---------------------------------------- 迴應是gzip編碼 ---------------------------------------- Date = Fri,22 Jul 2011 07:58:44 GMT Content-Encoding = gzip Content-Length = 5856 Content-Type = text/html; charset = UTF-8 X-Powered-By = JSF/1.2 Set-Cookie = JSESSIONID = 9D2hTptKQ1PqKsMvHcYLyFTVlQ6fTNWK3VtcQcVmBHqFb9fSbvYL!750321853;路徑= /; HttpOnly 內容長度= -1 內容編碼=空 致命傳輸錯誤:未使用GZIP格式 java.io.IOException:未使用GZIP格式 – mahesh