2010-10-25 43 views
1

我注意到一個奇怪的現象,當使用apache httpclient庫,我想知道它爲什麼會發生。我創建了一些示例代碼來演示。 考慮下面的代碼:Apache httpclient在加載之前返回頁面?

//Example URL 
String url = "http://rads.stackoverflow.com/amzn/click/05961580"; 
GetMethod get = new GetMethod(url); 
HttpMethodRetryHandler httpHandler = new DefaultHttpMethodRetryHandler(1, false); 
get.getParams().setParameter(HttpMethodParams.RETRY_HANDLER, httpHandler); 
get.getParams().setCookiePolicy(CookiePolicy.IGNORE_COOKIES); 
HttpConnectionManager connectionManager = new SimpleHttpConnectionManager(); 
HttpClient client = new HttpClient(connectionManager); 
client.getParams().setParameter("http.useragent", FIREFOX); 
String line; 
StringBuilder stringBuilder = new StringBuilder(); 
String toStreamBody = null; 
String toStringBody = null; 
try { 
    int statusCode = client.executeMethod(get); 
    if(statusCode != HttpStatus.SC_OK){ 
    System.err.println("Internet Status: " + HttpStatus.getStatusText(statusCode)); 
    System.err.println("While getting page: " + url); 
    } 
//toString 
    toStringBody = get.getResponseBodyAsString(); 
//toStream 
    InputStreamReader isr = new InputStreamReader(get.getResponseBodyAsStream()) 
    BufferedReader rd = new BufferedReader(isr); 
    while ((line = rd.readLine()) != null) { 
    stringBuilder.append(line); 
    } 
} catch (java.io.IOException ex) { 
    System.out.println("Failed to get page: " + url); 
} finally { 
    get.releaseConnection(); 
}  
toStreamBody = stringBuilder.toString(); 

此代碼打印什麼:

System.out.println(toStringBody); // "" 

此代碼打印網頁:

System.out.println(toStreamBody); // "Whole Page" 

但它變得更奇怪... 更換:

get.getResponseBodyAsString(); 

有了:

get.getResponseBodyAsString(150000); 

現在我們得到的錯誤: 無法獲取頁面:http://www.amazon.com/gp/offer-listing/0596158068/ref=dp_olp_used?ie=UTF8

我無法找到除了亞馬遜會複製這種行爲另一個網站,但我認爲還有其他的。

我知道,根據http://hc.apache.org/httpclient-3.x/performance.html的文檔不鼓勵使用getResponseBodyAsString(),它並不是說不會加載頁面,只是說您可能會面臨內存不足的異常。在加載之前getResponseBodyAsString()是否可能返回頁面?爲什麼這隻發生在亞馬遜?

回答

0

你測試過任何其他網址嗎?

您提供的代碼中的URL與302重定向到http://www.amazon.com/dp/05961580/?tag=stackoverfl08-20,然後返回404(未找到)。

HttpClient不處理重定向:http://hc.apache.org/httpclient-3.x/redirects.html

+0

哦,那不是鏈接。我會盡量改回它。 – Bob 2010-10-26 08:16:40

+0

該網站是http://www.amazon.com/gp/offer-listing/0596158068/ref=dp_olp_used?ie=UTF8 – Bob 2010-10-26 08:16:56

+0

好吧,出於某種原因,網站被網站更改。我無能爲力。 – Bob 2010-10-26 08:27:22