2014-10-20 65 views
-1

我想創建一個線程以抓取網站的所有鏈接並將其存儲在LinkedHashSet中,但是當我打印此LinkedHashSet的大小時,它不打印任何內容。我已經開始學習爬行了!我引用了Java的藝術。這裏是我的代碼:抓取網頁和存儲鏈接

import java.io.BufferedReader; 
import java.io.IOException; 
import java.io.InputStreamReader; 
import java.net.MalformedURLException; 
import java.net.URL; 
import java.util.LinkedHashSet; 
import java.util.logging.Level; 
import java.util.logging.Logger; 

public class TestThread { 

    public void crawl(URL url) { 
     try { 

      BufferedReader reader = new BufferedReader(
        new InputStreamReader(url.openConnection().getInputStream())); 
      String line = reader.readLine(); 
      LinkedHashSet toCrawlList = new LinkedHashSet(); 

      while (line != null) { 
       toCrawlList.add(line); 
       System.out.println(toCrawlList.size()); 
      } 
     } catch (IOException ex) { 
      Logger.getLogger(TestThread.class.getName()).log(Level.SEVERE, null, ex); 
     } 

    } 

    public static void main(String[] args) { 
     final TestThread test1 = new TestThread(); 
     Thread thread = new Thread(new Runnable() { 
      public void run(){ 
       try { 
        test1.crawl(new URL("http://stackoverflow.com/")); 
       } catch (MalformedURLException ex) { 
        Logger.getLogger(TestThread.class.getName()).log(Level.SEVERE, null, ex); 
       } 
      } 
     }); 
    } 
} 
+1

問題是什麼? – Marcin 2014-10-20 07:15:24

+0

我不知道如何獲得我已經被抓取和存儲的所有鏈接,我只是使用LinkHashSet來存儲,但是當我抓取並打印出來時,它什麼也沒有顯示 – TrangVu 2014-10-21 10:46:01

回答

0

應填寫您的列表如下:

while ((line = reader.readLine()) != null) { 
    toCrawlList.add(line); 
} 
System.out.println(toCrawlList.size()); 

如果還是不行,請嘗試在代碼中設置一個斷點,如果你的讀者甚至可以找出包含任何東西