爲什麼我的程序只能獲取網頁源代碼的一部分？

我有一個程序來拉取網頁的源代碼並將其保存到.txt文件。如果一次只做一個，它就可以工作，但是當我經歷一個100頁的循環時，突然間每個頁面源都會在1/4到3/4之間切斷（似乎是任意）。關於爲什麼或如何解決這個問題的任何想法？爲什麼我的程序只能獲取網頁源代碼的一部分？

最初的想法，那個循環對於java來說太快了（我從php腳本運行這個java），但後來認爲它在技術上不應該進入下一個項目，直到當前的條件完成。

這裏是我使用的代碼：

import java.io.*; 
import java.net.URL; 

public class selectout { 

public static BufferedReader read(String url) throws Exception{ 
    return new BufferedReader(
     new InputStreamReader(
      new URL(url).openStream()));} 

public static void main (String[] args) throws Exception{ 
    BufferedReader reader = read(args[0]); 
    String line = reader.readLine(); 
    String thenum = args[1]; 
    FileWriter fstream = new FileWriter(thenum+".txt"); 
    BufferedWriter out = new BufferedWriter(fstream); 
    while (line != null) { 

      out.write(line); 
      out.newLine(); 
     //System.out.println(line); 
     line = reader.readLine(); }} 
}

的PHP是一個基本的mysql_querywhile(fetch_assoc)抓鬥從數據庫的URL，然後運行system("java -jar crawl.jar $url $filename");

然後，fopen和fread新文件，最後將源碼保存到數據庫（在escaping_strings等之後）。

來源

2011-08-31 Calvin

您需要在完成每個文件的寫入後關閉輸出流。在while循環之後，調用out.close（）;和fstream.close（）;

來源

2011-08-31 18:26:53 Raider

哇，不敢相信我錯過了。謝謝！ – Calvin

您必須刷新流並關閉它。

finally{ //Error handling ignored in my example 
    fstream.flush(); 
    fstream.close(); 
}

來源

2011-08-31 18:35:34 Woot4Moo

爲什麼我的程序只能獲取網頁源代碼的一部分？

回答

相關問題