2012-09-29 115 views
2

我正在使用StAX來解析XML文件,並想知道每個標記的起始和結束位置。爲此,我嘗試使用getLocation().getCharacterOffset(),但它爲每個標記返回不正確的值。getCharacterOffset()返回不正確的值

XMLInputFactory factory = XMLInputFactory.newInstance(); 
XMLEventReader reader = factory.createXMLEventReader(
     new StringReader("<root>txt1<tag>txt2</tag></root>")); 

XMLEvent e; 
e = reader.nextEvent(); // START_DOCUMENT 
System.out.println(e); 
System.out.println(e.getLocation()); 
e = reader.nextEvent(); // START_ELEMENT "root" 
System.out.println(e); 
System.out.println(e.getLocation()); 
e = reader.nextEvent(); // CHARACTERS "txt1" 
System.out.println(e); 
System.out.println(e.getLocation()); 
e = reader.nextEvent(); // START_ELEMENT "tag" 
System.out.println(e); 
System.out.println(e.getLocation()); 

上面打印此代碼:

<?xml version="null" encoding='null' standalone='no'?> 
Line number = 1 
Column number = 1 
System Id = null 
Public Id = null 
Location Uri= null 
CharacterOffset = 0 

<root> 
Line number = 1 
Column number = 7 
System Id = null 
Public Id = null 
Location Uri= null 
CharacterOffset = 6 

txt1 
Line number = 1 
Column number = 12 
System Id = null 
Public Id = null 
Location Uri= null 
CharacterOffset = 11 

<tag> 
Line number = 1 
Column number = 16 
System Id = null 
Public Id = null 
Location Uri= null 
CharacterOffset = 15 

<root>CharacterOffset是正確6,但隨後txt1後是11,而我希望看到10。什麼抵消完全返回?

回答

2

這可能是Sun/Oracle的StAX實現的缺陷/功能。 隨着伍德斯托克斯,你得到0, 0, 6, 10,這似乎是正確的。 從http://wiki.fasterxml.com/WoodstoxHome和 下載Woodstox將JAR(woodstox-core + stax2-api)添加到您的課程路徑中。然後, XMLInputFactory將自動選擇Woodstox實現。