2017-05-11 24 views
0

我正在使用JSoup解析特定div中的鏈接列表。我可以通過使用 cssQuery語法來獲得鏈接。但我無法從裏面得到文本:返回來自特定div的鏈接中的所有文本與JSoup

private static Elements getLinkList(String URL) throws IOException { 
/* Download HTML page */ 
URL website = new URL(URL); 
ReadableByteChannel readableByteChannel = Channels.newChannel(website.openStream()); 
FileOutputStream fileOutputStream = new FileOutputStream(HTML_DOC); 
fileOutputStream.getChannel().transferFrom(readableByteChannel, 0, Long.MAX_VALUE); 

/* Collect list of links */ 
File input = new File(HTML_DOC); 
Document document = Jsoup.parse(input, "UTF-8", URL); 

return document.select("#div>a"); 
} 

我想從標籤內部獲取文本,但它是空白的。

<div id="div"> 
    <a href="http://www.sample.com/doc.doc" target="_blank">Installation guideline - Citrix XenApp 7.6 for PAS-X.doc<br></a> 
</div> 

回答

0

答案無需更改代碼我貼在我的OP代碼:

private static Elements getLinkList(String URL) throws IOException { 
/* Download HTML page */ 
URL website = new URL(URL); 
ReadableByteChannel readableByteChannel = Channels.newChannel(website.openStream()); 
FileOutputStream fileOutputStream = new FileOutputStream(HTML_DOC); 
fileOutputStream.getChannel().transferFrom(readableByteChannel, 0, Long.MAX_VALUE); 

/* Collect list of links */ 
File input = new File(HTML_DOC); 
Document document = Jsoup.parse(input, "UTF-8", URL); 

return document.select("#div>a"); 
} 

但在處理數據時,從鏈接Element本身即String titleText = link.text(),並完成檢索文本:

Elements links = getLinkList(URL); // Retrieve list of Elements from above method 

for (Element link: links) { 
String linkText = link.toString(); 
String titleText = link.text(); 
String formattedLink = org.apache.commons.lang3.StringUtils.substringBetween(linkText, "<a href=\"", "\""); 

System.out.println(titleText); 
System.out.println(formattedLink); 
} 
相關問題