從URL獲取頁面內容？

我想這個代碼從URL頁面的內容：從URL獲取頁面內容？

public static String getContentResult(URL url) throws IOException{ 

    InputStream in = url.openStream(); 
    StringBuffer sb = new StringBuffer(); 

    byte [] buffer = new byte[256]; 

    while(true){ 
     int byteRead = in.read(buffer); 
     if(byteRead == -1) 
      break; 
     for(int i = 0; i < byteRead; i++){ 
      sb.append((char)buffer[i]); 
     } 
    } 
    return sb.toString(); 
}

但這個網址：http://portal.acm.org/citation.cfm?id=152610.152611&coll=DL&dl=GUIDE&CFID=114782066&CFTOKEN=85539315 我不能讓Asbtract：數據庫管理系統將繼續管理.....

你可以給我解決方案解決問題嗎？在此先感謝

來源

2010-11-18 tiendv

可能的重複：http://stackoverflow.com/questions/1255730/java-retrieve-html-page-in-proper-encoding – 2010-11-18 15:32:01

@Matt Ball這裏的問題是OP需要執行JavaScript才能獲得期望的內容，從這個意義上說，這個問題是根本不同的。 – 2010-11-18 15:33:36

1.4.3的GET請求頭：

HTTP/1.1 302 Moved Temporarily 
Connection: close 
Date: Thu, 18 Nov 2010 15:35:24 GMT 
Server: Microsoft-IIS/6.0 
location: http://portal.acm.org/citation.cfm?id=152610.152611&coll=DL&dl=GUIDE 
Content-Type: text/html; charset=UTF-8

這意味着服務器要你下載新的位置解決。因此，無論是直接從UrlConnection獲取標題，然後按照該鏈接自動使用HttpClient，它會自動遵循重定向。下面的代碼是基於HttpClient：

public class HttpTest { 
    public static void main(String... args) throws Exception { 

     System.out.println(readPage(new URL("http://portal.acm.org/citation.cfm?id=152610.152611&coll=DL&dl=GUIDE&CFID=114782066&CFTOKEN=85539315"))); 
    } 

    private static String readPage(URL url) throws Exception { 

     DefaultHttpClient client = new DefaultHttpClient(); 
     HttpGet request = new HttpGet(url.toURI()); 
     HttpResponse response = client.execute(request); 

     Reader reader = null; 
     try { 
      reader = new InputStreamReader(response.getEntity().getContent()); 

      StringBuffer sb = new StringBuffer(); 
      { 
       int read; 
       char[] cbuf = new char[1024]; 
       while ((read = reader.read(cbuf)) != -1) 
        sb.append(cbuf, 0, read); 
      } 

      return sb.toString(); 

     } finally { 
      if (reader != null) { 
       try { 
        reader.close(); 
       } catch (IOException e) { 
        e.printStackTrace(); 
       } 
      } 
     } 
    } 
}

來源

2010-11-18 15:36:27 dacwe

你能說明哪些lib用於這段代碼，因爲我無法用apache的httpcore運行它！ – tiendv 2010-11-19 07:02:24

我可以運行你的代碼！但結果與我的代碼相同？你能給我什麼建議嗎 – tiendv 2010-11-21 15:14:06

@tiendv：我剛剛試過這段代碼，並且按照預期得到了重定向頁面，你想得到什麼？ – dacwe 2010-11-21 16:54:42

給定的網址上沒有「數據庫管理...」。也許，它是由JavaScript動態加載的。您需要有更復雜的應用程序才能下載此類內容;）

來源

2010-11-18 15:33:58

您正在查找的內容未包含在此URL中。打開瀏覽器並查看源代碼。相反，很多JavaScript文件都被加載。我認爲該內容稍後由AJAX調用提取。您需要了解內容是如何加載的。

Firfox插件Firebug可能對更詳細的分析有所幫助。

來源

2010-11-18 15:34:05 stacker

，你應該使用的網址是：

http://portal.acm.org/citation.cfm?id=152610.152611&coll=DL&dl=GUIDE

因爲您發佈的原始網址（由dacwe提到的）發送重定向。

來源

2010-11-18 15:40:45 user3111525

從URL獲取頁面內容？

回答

相關問題