保存網頁

幫助做出這樣的事情，我們有一個文本文件，有很多鏈接到不同的網站（每個鏈接rasolozhena一個新的行，他們寫在http://test.com的形式），你需要走在Java程序中全部鏈接並將這些網站的頁面保存在文件夾C：\/html中進行測試，並且這些頁面的名稱與標籤中的名稱相同保存網頁

來源

2012-11-19 Eric Scot

[你有什麼嘗試？]（http://www.whathaveyoutried.com/） –

那麼，或提供參考資料，描述如何做到這一點，但如果你寫信給我的代碼，我會非常感激 –

那麼，誰會分享鏈接？） –

這是閱讀代碼正如你在問題中描述的那樣，來自txt文件的URL並寫入另一個文件。

public static void main(String[] args) { 
    BufferedReader reader = null; 
    try { 
     reader = new BufferedReader(new FileReader(new File("urlList.txt"))); 
     String url = reader.readLine(); 
     int i = 0; 
     while (url != null) { 
      try { 
       getContent(url, i); 
      } catch (IOException io) { 
       System.out.println(io); 
      } 
      i++; 
      url = reader.readLine(); 
     } 

    } catch (IOException io) { 
     System.out.println(io); 
    } finally { 
     if (reader != null) { 
      try { 
       reader.close(); 
      } catch (IOException e) { 
       // nothing 
      } 
     } 
    } 
} 

private static void getContent(String url, int index) 
     throws MalformedURLException, IOException { 
    URL pageUrl; 
    URLConnection conn = null; 

    pageUrl = new URL(url); 
    conn = pageUrl.openConnection(); 

    conn.connect(); 

    InputStreamReader in = new InputStreamReader(conn.getInputStream()); 
    BufferedReader reader = new BufferedReader(in); 
    String htmlFileName = "file_content_" + index + ".txt"; 
    FileWriter fWriter = new FileWriter(htmlFileName); 
    BufferedWriter bWriter = new BufferedWriter(fWriter); 
    String urlData = null; 
    while ((urlData = reader.readLine()) != null) { 
     bWriter.write(urlData); 
     bWriter.newLine(); 
    } 
    bWriter.close(); 
}

來源

2012-11-19 13:15:54 Victor

謝謝，但我感興趣的問題是，該程序bralav從文件鏈接和存儲所有頁面在他們的格式 –

所以，@EricScot，你需要在你的指定題！ – Victor

我指出，我需要從文本文件中獲取所有的鏈接並從它們中獲取信息進行保存，您可以給出更詳細的答案。初學者不是很清楚 –

public class URLReader 
{ 
     public static void main(String[] args) 
     { 
      try 
        { 
        URL pageUrl; 
        URLConnection conn = null; 

        pageUrl = new URL("https://www.google.ru/"); 
        conn = pageUrl.openConnection(); 

        conn.connect(); 

        InputStreamReader in = new InputStreamReader(conn.getInputStream()); 
        BufferedReader reader = new BufferedReader(in); 
        String htmlFileName = "C:\\hello.html"; 
        FileWriter fWriter = new FileWriter(htmlFileName); 
        BufferedWriter bWriter = new BufferedWriter(fWriter); 
        String urlData = null; 
        while ((urlData = reader.readLine()) != null) 
        { 
          bWriter.write(urlData); 
          bWriter.newLine(); 
        } 
        bWriter.close(); 
       } 
       catch(IOException io) 
       { 
        System.out.println(io); 
       } 
     } 
}

@Victor這裏有一個開始，你可以提高代碼，一切是我的問題描述？請

來源

2012-11-19 14:29:15

看看我的答案！ – Victor

我問過類似的問題，前一段時間：Reading website's contents into string

而不是將其讀入字符串，可以把它複製到一些FileOutputStream。有一個很好的功能，在Apache的百科全書IOUtils：

copy(InputStream input, OutputStream output) 
Copy bytes from an InputStream to an OutputStream.

http://commons.apache.org/io/api-release/org/apache/commons/io/IOUtils.html

如果你想在您的網頁下載圖片和其他文件也一樣，你最好使用一些庫。

當然，如果你正在學習，你可以自己實現它。正則表達式可用於查找HTML文件中圖像的鏈接。

來源

2012-11-19 14:34:37

回答

相關問題