如何從java中的任何網頁獲取標題文本

我使用標記名如下必須獲取圖像從網頁：

int i=1; 
InputStream in=new URL("www.yahoo.com").openStream(); 
org.w3c.dom.Document doc= new Tidy().parseDOM(in, null); 
    NodeList img=doc.getElementsByTagName("img"); 
ArrayList<String> list=new ArrayList<String>();     
    list.add(img.item(i).getAttributes().getNamedItem("src").getNodeValue());

這是工作，但我想從獲取網頁（www.yahoo.com）的標題標籤使用相同的代碼爲以上。我已經提到getElementsByTagName（「title」）;但它不起作用。請幫助我，如何做到這一點使用jtidy解析器如上。

來源

2011-05-07 DJ31

注意NodeList索引從0開始（我看到你的「int i = 1;」）http://download.oracle.com/javase/1.4.2/docs/api/org/w3c/dom/NodeList.html。此外，您可以「屬性（即」src「）的」getNodeValue（）「，但不是元素http://download.oracle.com/javase/1.5.0/docs/api/org/w3c/dom/Node.html。在這種情況下，你可以使用「getTextContent（）」，因爲我不相信「標題」標籤有子元素。所以：

String titleText = doc.getElementsByTagName("title").item(0).getTextContent();

或者：

String titleText = doc.getElementsByTagName("title").item(0).getFirstChild().getNodeValue();

來源

2011-05-07 10:44:58 Matt

感謝馬特，它幫助我。 – DJ31 2011-05-07 11:31:29

不客氣。 – Matt 2011-05-07 12:15:59

嫣不能告訴你，除非你張貼實際上tyring用拿到冠軍的代碼，但是這顯然是行不通的：

list.add(img.item(i).getAttributes().getNamedItem("src").getNodeValue());

因爲title元素沒有一個src屬性。

來源

2011-05-07 06:39:43

您可以輕鬆地使用XPath獲取一個HTML頁面的標題：

/html/head/title/text()

您可以Dom4J輕鬆實現這一點，我覺得在JTidy爲好。

來源

2011-05-07 08:07:29

如何從java中的任何網頁獲取標題文本

回答

相關問題