我試圖做一個方法,下載一個網頁。 首先,我創建一個HttpURLConnection。 其次,我調用connect()方法。 三,我通過BufferedReader讀取數據。用HttpURLConnection慢速下載
問題是,有些頁面會得到合理的閱讀時間,但有些頁面非常慢(可能需要大約10分鐘!)。慢速頁面總是相同的,它們來自同一個網站。用瀏覽器打開這些頁面只需要幾秒鐘而不是10分鐘。下面是代碼
static private String getWebPage(PageNode pagenode)
{
String result;
String inputLine;
URI url;
int cicliLettura=0;
long startTime=0, endTime, openConnTime=0,connTime=0, readTime=0;
try
{
if(Core.logGetWebPage())
startTime=System.nanoTime();
result="";
url=pagenode.getUri();
if(Core.logGetWebPage())
openConnTime=System.nanoTime();
HttpURLConnection yc = (HttpURLConnection) url.toURL().openConnection();
if(url.toURL().getProtocol().equalsIgnoreCase("https"))
yc=(HttpsURLConnection)yc;
yc.addRequestProperty("User-Agent", "Mozilla/5.0 (Windows; U; Windows NT 6.1; en-GB; rv:1.9.2.13) Gecko/20101203 Firefox/3.6.13 (.NET CLR 3.5.30729)");
yc.connect();
if(Core.logGetWebPage())
connTime=System.nanoTime();
BufferedReader in = new BufferedReader(new InputStreamReader(yc.getInputStream()));
while ((inputLine = in.readLine()) != null)
{
result=result+inputLine+"\n";
cicliLettura++;
}
if(Core.logGetWebPage())
readTime=System.nanoTime();
in.close();
yc.disconnect();
if(Core.logGetWebPage())
{
endTime=System.nanoTime();
System.out.println(/*result+*/"getWebPage eseguito in "+(endTime-startTime)/1000000+" ms. Size: "+result.length()+" Response Code="+yc.getResponseCode()+" Protocollo="+url.toURL().getProtocol()+" openConnTime: "+(openConnTime-startTime)/1000000+" connTime:"+(connTime-openConnTime)/1000000+" readTime:"+(readTime-connTime)/1000000+" cicliLettura="+cicliLettura);
}
return result;
}catch(IOException e){
System.out.println("Eccezione: "+e.toString());
e.printStackTrace();
return null;
}
}
這裏有兩個日誌樣本 一個 「正常」 的網頁 getWebPage執行尺寸:48261響應代碼= 200協議= HTTP openConnTime:0 connTime:1 readTime:569 cicliLettura = 359
一個http://ricette.giallozafferano.it/Pan-di-spagna-al-cacao.html/allcomments 「慢」 的網頁 的看起來像這樣 getWebPage執行尺寸:1748261二維碼= 200協議= HTTP openConnTime:0 connTime:1 readTime:596834 cicliLettura = 35685
只是偉大的,非常親的答案。你能爲此發送一份解釋性文件嗎?這是該頁面的新輸出:getWebPage大小:1709466響應代碼= 200 Protocollo = http openConnTime:0 connTime:0 readTime:2257 cicliLettura = 35686 – mark