使用JSoup從網站獲取文本

我正在使用JSoup來解析html網站。我想從（例如）維基百科獲得文章。我希望從「今日精選文章」表中獲取主頁（http://en.wikipedia.org/wiki/Main_Page）中的文字。使用JSoup從網站獲取文本

下面的代碼：

Document doc = Jsoup.connect("http://en.wikipedia.org/wiki/Main_Page」); 
Elements el = doc.select("div.mp-tfa」); 
System.out.println(el);

的問題是，它不能正常工作 - 它打印出只是一個空行。「從今天的專題文章」表插入在div class =「mp-tfa」中。

如何在我的java程序中獲取此文本？

在此先感謝。

來源

2014-02-09 Ganjira

變化：

doc.select("div.mp-tfa");

要：

doc.select("div#mp-tfa");

更好的方法將遍歷從而獲取了tag，class或您選擇的Element的Elements，簡單地說：

Document doc = Jsoup.connect("http://en.wikipedia.org/wiki/Main_Page").get(); 
Elements el = doc.select("div#mp-tfa"); 
for (Element e : el) { 
    System.out.println(e.text()); 
}

會給：

The Boulonnais is a heavy draft horse breed from Fr....

來源

2014-02-09 07:59:20 PopoFibo

非常感謝！它幫助！ ;） – Ganjira

@托萊多很高興幫助:) – PopoFibo

我認爲它應該是：

Document doc = Jsoup.connect("http://en.wikipedia.org/wiki/Main_Page").get(); 
Elements el = doc.select("div#mp-tfa"); 
System.out.println(el);

來源

2014-02-09 07:59:54 theconsultingthief

非常感謝！它幫助！ ;） – Ganjira

很高興提供幫助，雖然PopoFibo的答案更全面。 :) – theconsultingthief

使用JSoup從網站獲取文本

回答

相關問題