0
我使用TagSoup用Java來提取一些數據,但某些XPATH不工作,我只是得到空結果的Java使用XPath和TagSoup
FileReader frInHtml = new FileReader("doc.html");
BufferedReader brInHtml = new BufferedReader(frInHtml);
SAXBuilder saxBuilder = new SAXBuilder("org.ccil.cowan.tagsoup.Parser");
org.jdom.Document jdomDocument = saxBuilder.build(brInHtml);
// This is working
XPath xpath = XPath.newInstance("/ns:html[1]/ns:body/ns:div[@class='content']/ns:table/ns:tr/ns:td/ns:h1");
// All 3 lines below didn't work , tried them 1 at a time
XPath xpath = XPath.newInstance("/ns:html/ns:body/ns:div[7]/ns:table/ns:tbody/ns:tr/ns:td/ns:h1");
XPath xpath = XPath.newInstance("//html//body//div[7]//table//tbody//tr//td//h1");
XPath xpath = XPath.newInstance("/html/body/div[7]/table/tbody/tr/td/h1");
xpath.addNamespace("ns", "http://www.w3.org/1999/xhtml");
很難說沒有XML。我注意到,在那個工作中你不使用'tbody'標籤,而它總是存在於另外3個標籤中。 – 2012-01-17 14:02:39