如何從JDOM獲取節點內容

我正在用java編寫一個應用程序，使用import org.jdom。*;如何從JDOM獲取節點內容

我的XML是有效的，但有時它包含HTML標籤。例如，像這樣：

<program-title>Anatomy &amp; Physiology</program-title> 
    <overview> 
     <content> 
       For more info click <a href="page.html">here</a> 
       <p>Learn more about the human body. Choose from a variety of Physiology (A&amp;P) designed for complementary therapies.&amp;#160; Online studies options are available.</p> 
     </content> 
    </overview> 
    <key-information> 
    <category>Health &amp; Human Services</category>

所以我的問題是與< P>在overview.content節點內的標籤。

我希望這個代碼將工作：

 Element overview = sds.getChild("overview"); 
     Element content = overview.getChild("content"); 

     System.out.println(content.getText());

但它返回空白。

如何從overview.content節點返回所有文本（嵌套標籤和全部）？

感謝

來源

2011-10-27 jeph perro

嗨，我怎麼能拉平內容節點進行遞歸，當文本與其他節點的混合。例如，超鏈接位於句子的中間。我已經添加了一些幫助。 –

需要獲取內容標記中的所有HTML，包括鏈接和有序列表。謝謝 –

的問題是，<content>節點沒有文本子;它有一個<p>孩子碰巧包含文本。

試試這個：

Element overview = sds.getChild("overview"); 
Element content = overview.getChild("content"); 
Element p = content.getChild("p"); 
System.out.println(p.getText());

如果你想所有直接子節點，調用p.getChildren()。如果你想獲得所有的子節點，你必須遞歸地調用它。

來源

2011-10-27 00:26:23 duffymo

然後只需手動將元素類型節點轉換爲文本表示...可能比我想象的簡單。 –

您可以嘗試使用method getValue()作爲最接近的近似值，但這樣做是將元素內的所有文本和子代連接在一起。這不會給你任何形式的<p>標籤。如果該標籤像您所顯示的那樣位於您的XML中，則它已成爲XML標記的一部分。它需要被包含爲<p>或嵌入到CDATA部分以作爲文本對待。或者，如果您知道所有可能出現或可能不出現在XML中的元素，則可以應用XSLT轉換，將不打算用作標記的東西轉換爲純文本。

來源

2011-10-27 00:30:03

對於那些不需要混合內容中的元素名稱的人來說，這是一個完美的答案。謝謝！ –

content.getText()給出了即時文本，它只對帶有文本內容的葉元素有用。

技巧是使用org.jdom.output.XMLOutputter（文本模式CompactFormat）

public static void main(String[] args) throws Exception { 
    SAXBuilder builder = new SAXBuilder(); 
    String xmlFileName = "a.xml"; 
    Document doc = builder.build(xmlFileName); 

    Element root = doc.getRootElement(); 
    Element overview = root.getChild("overview"); 
    Element content = overview.getChild("content"); 

    XMLOutputter outp = new XMLOutputter(); 

    outp.setFormat(Format.getCompactFormat()); 
    //outp.setFormat(Format.getRawFormat()); 
    //outp.setFormat(Format.getPrettyFormat()); 
    //outp.getFormat().setTextMode(Format.TextMode.PRESERVE); 

    StringWriter sw = new StringWriter(); 
    outp.output(content.getContent(), sw); 
    StringBuffer sb = sw.getBuffer(); 
    System.out.println(sb.toString()); 
}

輸出

For more info click<a href="page.html">here</a><p>Learn more about the human body. Choose from a variety of Physiology (A&amp;P) designed for complementary therapies.&amp;#160; Online studies options are available.</p>

做探索其他formatting選擇和修改上面的代碼你的需要。

「級封裝的XMLOutputter格式選項。典型的用戶可以使用（）由getRawFormat獲得的標準格式的結構（沒有空格的變化），getPrettyFormat（）（空白美化），和getCompactFormat（）（空白正常化）。「

來源

2012-01-13 16:42:58

謝謝你的男人！ –

好吧，也許這就是你所需要的：

import java.io.StringReader; 

import org.custommonkey.xmlunit.XMLTestCase; 
import org.custommonkey.xmlunit.XMLUnit; 
import org.jdom.input.SAXBuilder; 
import org.jdom.output.XMLOutputter; 
import org.testng.annotations.Test; 
import org.xml.sax.InputSource; 

public class HowToGetNodeContentsJDOM extends XMLTestCase 
{ 
    private static final String XML = "<root>\n" + 
      " <program-title>Anatomy &amp; Physiology</program-title>\n" + 
      " <overview>\n" + 
      "  <content>\n" + 
      "    For more info click <a href=\"page.html\">here</a>\n" + 
      "    <p>Learn more about the human body. Choose from a variety of Physiology (A&amp;P) designed for complementary therapies.&amp;#160; Online studies options are available.</p>\n" + 
      "  </content>\n" + 
      " </overview>\n" + 
      " <key-information>\n" + 
      "  <category>Health &amp; Human Services</category>\n" + 
      " </key-information>\n" + 
      "</root>"; 
    private static final String EXPECTED = "For more info click <a href=\"page.html\">here</a>\n" + 
      "<p>Learn more about the human body. Choose from a variety of Physiology (A&amp;P) designed for complementary therapies.&amp;#160; Online studies options are available.</p>"; 

    @Test 
    public void test() throws Exception 
    { 
     XMLUnit.setIgnoreWhitespace(true); 
     Document document = new SAXBuilder().build(new InputSource(new StringReader(XML))); 
     List<Content> content = document.getRootElement().getChild("overview").getChild("content").getContent(); 
     String out = new XMLOutputter().outputString(content); 
     assertXMLEqual("<root>" + EXPECTED + "</root>", "<root>" + out + "</root>"); 
    } 
}

輸出：

PASSED: test on instance null(HowToGetNodeContentsJDOM) 

=============================================== 
    Default test 
    Tests run: 1, Failures: 0, Skips: 0 
===============================================

我使用JDOM使用泛型：http://www.junlu.com/list/25/883674.html

編輯：其實這不算多與Prashant Bhate的回答不同，也許你需要告訴我們你缺少的東西......

來源

2012-01-16 23:20:58 yankee

不是特別漂亮，但能正常工作（使用JDOM API）：

public static String getRawText(Element element) { 
    if (element.getContent().size() == 0) { 
     return ""; 
    } 

    StringBuffer text = new StringBuffer(); 
    for (int i = 0; i < element.getContent().size(); i++) { 
     final Object obj = element.getContent().get(i); 
     if (obj instanceof Text) { 
      text.append(((Text) obj).getText()); 
     } else if (obj instanceof Element) { 
      Element e = (Element) obj; 
      text.append("<").append(e.getName()); 
      // dump all attributes 
      for (Attribute attribute : (List<Attribute>)e.getAttributes()) { 
       text.append(" ").append(attribute.getName()).append("=\"").append(attribute.getValue()).append("\""); 
      } 
      text.append(">"); 
      text.append(getRawText(e)).append("</").append(e.getName()).append(">"); 
     } 
    } 
    return text.toString(); 
}

PRASHANT Bhate的解決方案是更好，但！

來源

2012-01-17 11:10:02

如果您還生成XML文件，您應該能夠將您的html數據封裝在<![CDATA[]]>中，以便它不會被XML解析器解析。

來源

2012-01-18 02:56:39 aoi222

不，不幸的是我不生成XML，我只需要使用它。 –

如果要輸出一些JSOM節點的內容只是用

System.out.println(new XMLOutputter().outputString(node))

來源

2016-09-15 09:51:26 lujop

如何從JDOM獲取節點內容

回答

相關問題