2012-01-17 42 views
0

我使用TagSoup用Java來提取一些數據,但某些XPATH不工作,我只是得到空結果的Java使用XPath和TagSoup

FileReader frInHtml = new FileReader("doc.html"); 
    BufferedReader brInHtml = new BufferedReader(frInHtml); 

    SAXBuilder saxBuilder = new SAXBuilder("org.ccil.cowan.tagsoup.Parser"); 
    org.jdom.Document jdomDocument = saxBuilder.build(brInHtml); 

// This is working         
XPath xpath = XPath.newInstance("/ns:html[1]/ns:body/ns:div[@class='content']/ns:table/ns:tr/ns:td/ns:h1"); 

// All 3 lines below didn't work , tried them 1 at a time 
    XPath xpath = XPath.newInstance("/ns:html/ns:body/ns:div[7]/ns:table/ns:tbody/ns:tr/ns:td/ns:h1"); 
    XPath xpath = XPath.newInstance("//html//body//div[7]//table//tbody//tr//td//h1"); 
    XPath xpath = XPath.newInstance("/html/body/div[7]/table/tbody/tr/td/h1");        

    xpath.addNamespace("ns", "http://www.w3.org/1999/xhtml"); 
+3

很難說沒有XML。我注意到,在那個工作中你不使用'tbody'標籤,而它總是存在於另外3個標籤中。 – 2012-01-17 14:02:39

回答

1

要調試這一點,你需要看看「相當於XML「由TagSoup生成。爲了幫助您,您需要向我們展示相同的XML。