我一直在嘗試下載Google新聞RSS源的源代碼。除了顯示不正常的鏈接之外,它可以正確下載。HTML未正確下載
static String urlNotizie = "https://news.google.it/news/feeds?pz=1&cf=all&ned=it&hl=it&output=rss";
Document docHtml = Jsoup.connect(urlNotizie).get();
String html = docHtml.toString();
System.out.println(html);
輸出:
<html>
<head></head>
<body>
<rss version="2.0">
<channel>
<generator>
NFE/1.0
</generator>
<title>Prima pagina - Google News</title>
<link />http://news.google.it/news?pz=1&ned=it&hl=it
<language>
it
</language>
<webmaster>
[email protected]
</webmaster>
<copyright>
&copy;2013 Google
</copyright> [...]
使用一個URLConnection我能夠輸出的頁面的正確來源。但在解析過程中,我遇到了與上面相同的問題,它出現了一個列表<link />.
(同樣只有鏈接,解析其他東西時效果很好)。 URLConnection的例子:
URL u = new URL(urlNotizie);
URLConnection yc = u.openConnection();
StringBuilder builder = new StringBuilder();
BufferedReader reader = new BufferedReader(new InputStreamReader(
yc.getInputStream()));
String line;
while ((line = reader.readLine()) != null) {
builder.append(line);
builder.append("\n");
}
String html = builder.toString();
System.out.println("HTML " + html);
Document doc = Jsoup.parse(html);
Elements listaTitoli = doc.select("title");
Elements listaCategorie = doc.select("category");
Elements listaDescrizioni = doc.select("description");
Elements listaUrl = doc.select("link");
System.out.println(listaUrl);
其下載正確否則jsoup將不能夠把它變成一個文檔內替換
,事情顯然出問題的toString()方法。當然可以直接使用URLConnection或Apache HttpClient直接獲取RSS數據。 – Gimby
已更新的問題與新代碼 –