-1
我想創建一個線程以抓取網站的所有鏈接並將其存儲在LinkedHashSet
中,但是當我打印此LinkedHashSet
的大小時,它不打印任何內容。我已經開始學習爬行了!我引用了Java的藝術。這裏是我的代碼:抓取網頁和存儲鏈接
import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.net.MalformedURLException;
import java.net.URL;
import java.util.LinkedHashSet;
import java.util.logging.Level;
import java.util.logging.Logger;
public class TestThread {
public void crawl(URL url) {
try {
BufferedReader reader = new BufferedReader(
new InputStreamReader(url.openConnection().getInputStream()));
String line = reader.readLine();
LinkedHashSet toCrawlList = new LinkedHashSet();
while (line != null) {
toCrawlList.add(line);
System.out.println(toCrawlList.size());
}
} catch (IOException ex) {
Logger.getLogger(TestThread.class.getName()).log(Level.SEVERE, null, ex);
}
}
public static void main(String[] args) {
final TestThread test1 = new TestThread();
Thread thread = new Thread(new Runnable() {
public void run(){
try {
test1.crawl(new URL("http://stackoverflow.com/"));
} catch (MalformedURLException ex) {
Logger.getLogger(TestThread.class.getName()).log(Level.SEVERE, null, ex);
}
}
});
}
}
問題是什麼? – Marcin 2014-10-20 07:15:24
我不知道如何獲得我已經被抓取和存儲的所有鏈接,我只是使用LinkHashSet來存儲,但是當我抓取並打印出來時,它什麼也沒有顯示 – TrangVu 2014-10-21 10:46:01