Unmashalling使用StAX - 它跳過元素,如果有他們語境之間
我需要解析XML沒有空格。這個XML很大,所以我使用StAx來處理我感興趣的每個元素。我使用JDK附帶的默認實現。
問題
當XML元素跟着另一個元件相同的類型(例如<person>
),而且也沒有在它們之間的任何字符,它將跳過第二個。所以如果我有一個又一個,我只能解僱5個人。例如:
<people><person>..</person><person>..</person></people>
我建立一個測試,以顯示對封裝在方法countUnmarshalledPersonEntities()
一段代碼此行爲。
的事情是,當要素之間存在正的場所,如:
<people><person><id>1</id></person> <person><id>2</id></person></people>
它解組兩個實體,這就是確定。
但是,當有喜歡的節點之間沒有空格:
<people><person><id>1</id></person><person><id>2</id></person></people>
第一解組跳過下一個開放標籤<person>
,然後第二個人被忽略。我只解析1個實體。
測試
package org.opensource.lab.stream;
import static org.junit.Assert.assertEquals;
import java.io.InputStream;
import javax.xml.bind.JAXBContext;
import javax.xml.bind.Unmarshaller;
import javax.xml.bind.annotation.XmlRootElement;
import javax.xml.stream.XMLInputFactory;
import javax.xml.stream.XMLStreamConstants;
import javax.xml.stream.XMLStreamReader;
import org.apache.commons.io.IOUtils;
import org.junit.After;
import org.junit.Before;
import org.junit.Test;
public class StreamParserProblemTest {
private XMLInputFactory xmlif;
private XMLStreamReader xmlStreamReader;
private Unmarshaller personUnmarshaller;
private final InputStream xmlStreamPersonsNoSeparated = IOUtils.toInputStream(
"<people><person><id>1</id></person><person><id>2</id></person></people>"
);
private final InputStream xmlStreamWithPersonsWhitespaceSeparated = IOUtils.toInputStream(
"<people><person><id>1</id></person> <person><id>2</id></person></people>"
);
@Before
public void setUp() throws Exception {
JAXBContext jaxbContext = JAXBContext.newInstance(Person.class);
personUnmarshaller = jaxbContext.createUnmarshaller();
xmlif = XMLInputFactory.newInstance();
}
@After
public void cleanUp() throws Exception {
if(xmlStreamReader != null) {
xmlStreamReader.close();
}
}
@XmlRootElement(name = "person")
static class Person {
String id;
}
@Test
public void whenNoSpacesBetweenNodes_shouldFind2Persons_FAIL() throws Exception {
xmlStreamReader = xmlif.createXMLStreamReader(xmlStreamPersonsNoSeparated, "UTF-8");
int personTagsFound = countUnmarshalledPersonEntities();
assertEquals(personTagsFound, 2);
}
/**
* I don't know why, but if there's at least one whitespace character between node of the same type it won't skip.
*
* @throws Exception in a test
*/
@Test
public void whenWithSpacesBetweenNodes_shouldFind2Persons_SUCCESS() throws Exception {
xmlStreamReader = xmlif.createXMLStreamReader(xmlStreamWithPersonsWhitespaceSeparated, "UTF-8");
int personTagsFound = countUnmarshalledPersonEntities();
assertEquals(personTagsFound, 2);
}
/**
* CODE to test.
*
* @return number of unmarshalled persons (people).
* @throws Exception
*/
private int countUnmarshalledPersonEntities() throws Exception {
int personTagsFound = 0;
while (xmlStreamReader.hasNext()) {
int type = xmlStreamReader.next();
if (type == XMLStreamConstants.START_ELEMENT && xmlStreamReader.getName().toString().equalsIgnoreCase("person")) {
personUnmarshaller.unmarshal(xmlStreamReader, Person.class);
personTagsFound++;
}
}
return personTagsFound;
}
}
有沒有什麼是代碼的問題的任何想法?
謝謝。