我正在寫一些java代碼以獲取某些維基百科文章的原始文本(給出一個單詞列表,在維基百科中搜索它們並提取相應文章的第一句)。我的GUI包含了我定義瞭如下的動作監聽按鈕:來自Wikipeda的摘錄文章的文本
private void loadButtonActionPerformed(java.awt.event.ActionEvent evt) {
final DefaultListModel conceptsListFilesModel = new DefaultListModel();
conceptsList.setModel(conceptsListFilesModel);
final List definitionWiki = new ArrayList();
//Remplir la list avec la première collone de la liste
final Thread updater = new Thread(){
@Override public void run() {
for(int i=0; i< 20 /*dataTable.getRowCount()*/ ; i++) {
conceptsListFilesModel.addElement(dataTable.getValueAt(i, 0));
try {
Object concept = conceptsListFilesModel.elementAt(i);
WikipediaParser parser = new WikipediaParser("en");
System.out.println(concept+"");
String firstParagraph = parser.fetchFirstParagraph(concept+"");
int point = firstParagraph.indexOf(".");
String firstsentence = firstParagraph.substring(0, point+1);
definitionWiki.add(i, firstsentence) ;
} catch (IOException ex) {
Logger.getLogger(Tex2TaxView.class.getName()).log(Level.SEVERE, null, ex);
}
try { Thread.sleep(1000);
} catch (InterruptedException e) {throw new RuntimeException(e) ;}
}
JOptionPane.showMessageDialog(null, "Successful loading !") ;
}
};
updater.start();
}
的WikipediaParser類:
public class WikipediaParser {
private final String baseUrl;
public WikipediaParser(String lang) {
this.baseUrl = String.format("http://%s.wikipedia.org/wiki/", lang);
}
public String fetchFirstParagraph(String article) throws IOException {
String url = baseUrl + article;
Document doc = Jsoup.connect(url).get();
Elements paragraphs = doc.select(".mw-content-ltr p");
Element firstParagraph = paragraphs.first();
return firstParagraph.text();
}
}
執行產生異常的下面的列表:
nov. 30, 2011 12:42:55 AM tex2tax.Tex2TaxView$11 run
Grave: null java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:150)
at java.net.SocketInputStream.read(SocketInputStream.java:121)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:641)
at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:589)
at
sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1319)
at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:468)
at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:381)
at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:364)
at org.jsoup.helper.HttpConnection.execute(HttpConnection.java:143)
at org.jsoup.helper.HttpConnection.get(HttpConnection.java:132)
at tex2tax.WikipediaParser.fetchFirstParagraph(WikipediaParser.java:25)
at tex2tax.Tex2TaxView$11.run(Tex2TaxView.java:595)
需要幫助解決此問題
維基百科沒有任何類型的API?讓我感到驚訝。還是那個不祥的'WikipediaParser'?似乎有信息丟失:) – Voo
我嘗試使用JWPL,但它不適合我。所以我更喜歡訪問在線維基百科。 WikipediaParser是我爲了使用jSoup解析文本而寫的一個類。 – Lida
添加您的WikipediaParser類 –