我有一個類似的任務,雖然原來的問題比一年還舊,但我找不到滿意的答案。到目前爲止,最有趣的答案是Blaise Doughan的答案,但是我無法讓它在我期望的XML上運行(也許一些底層解析器的參數可能會改變它?)。在這裏,XML,非常simplyfied:
<many-many-tags>
<description>
...
<p>Lorem ipsum...</p>
Devils inside...
...
</description>
</many-many-tags>
我的解決辦法:
public static String readElementBody(XMLEventReader eventReader)
throws XMLStreamException {
StringWriter buf = new StringWriter(1024);
int depth = 0;
while (eventReader.hasNext()) {
// peek event
XMLEvent xmlEvent = eventReader.peek();
if (xmlEvent.isStartElement()) {
++depth;
}
else if (xmlEvent.isEndElement()) {
--depth;
// reached END_ELEMENT tag?
// break loop, leave event in stream
if (depth < 0)
break;
}
// consume event
xmlEvent = eventReader.nextEvent();
// print out event
xmlEvent.writeAsEncodedUnicode(buf);
}
return buf.getBuffer().toString();
}
用例:
XMLEventReader eventReader = ...;
while (eventReader.hasNext()) {
XMLEvent xmlEvent = eventReader.nextEvent();
if (xmlEvent.isStartElement()) {
StartElement elem = xmlEvent.asStartElement();
String name = elem.getName().getLocalPart();
if ("DESCRIPTION".equals(name)) {
String xmlFragment = readElementBody(eventReader);
// do something with it...
System.out.println("'" + fragment + "'");
}
}
else if (xmlEvent.isEndElement()) {
// ...
}
}
注意,提取XML片段將包含完整的提取正文內容,包括白色空間和評論。根據需要過濾這些內容,或者使緩衝區大小可以參數化,但爲了簡化代碼,已將其忽略:
'
<description>
...
<p>Lorem ipsum...</p>
Devils inside...
...
</description>
'
準確地說你的問題是什麼? – javamonkey79 2010-12-04 04:58:47