2015-06-23 45 views
1

Unmashalling使用StAX - 它跳過元素,如果有他們語境之間

我需要解析XML沒有空格。這個XML很大,所以我使用StAx來處理我感興趣的每個元素。我使用JDK附帶的默認實現。

問題

當XML元素跟着另一個元件相同的類型(例如<person>),而且也沒有在它們之間的任何字符,它將跳過第二個。所以如果我有一個又一個,我只能解僱5個人。例如:

<people><person>..</person><person>..</person></people> 

我建立一個測試,以顯示對封裝在方法countUnmarshalledPersonEntities()一段代碼此行爲。

的事情是,當要素之間存在正的場所,如:

<people><person><id>1</id></person> <person><id>2</id></person></people> 

它解組兩個實體,這就是確定。

但是,當有喜歡的節點之間沒有空格:

<people><person><id>1</id></person><person><id>2</id></person></people> 

第一解組跳過下一個開放標籤<person>,然後第二個人被忽略。我只解析1個實體。

測試

package org.opensource.lab.stream; 

import static org.junit.Assert.assertEquals; 

import java.io.InputStream; 

import javax.xml.bind.JAXBContext; 
import javax.xml.bind.Unmarshaller; 
import javax.xml.bind.annotation.XmlRootElement; 
import javax.xml.stream.XMLInputFactory; 
import javax.xml.stream.XMLStreamConstants; 
import javax.xml.stream.XMLStreamReader; 

import org.apache.commons.io.IOUtils; 
import org.junit.After; 
import org.junit.Before; 
import org.junit.Test; 

public class StreamParserProblemTest { 
    private XMLInputFactory xmlif; 
    private XMLStreamReader xmlStreamReader; 
    private Unmarshaller personUnmarshaller; 

    private final InputStream xmlStreamPersonsNoSeparated = IOUtils.toInputStream(
      "<people><person><id>1</id></person><person><id>2</id></person></people>" 
      ); 
    private final InputStream xmlStreamWithPersonsWhitespaceSeparated = IOUtils.toInputStream(
      "<people><person><id>1</id></person> <person><id>2</id></person></people>" 
      ); 

    @Before 
    public void setUp() throws Exception { 
     JAXBContext jaxbContext = JAXBContext.newInstance(Person.class); 
     personUnmarshaller = jaxbContext.createUnmarshaller(); 
     xmlif = XMLInputFactory.newInstance(); 
    } 

    @After 
    public void cleanUp() throws Exception { 
     if(xmlStreamReader != null) { 
      xmlStreamReader.close(); 
     } 
    } 

    @XmlRootElement(name = "person") 
    static class Person { 
     String id; 
    } 

    @Test 
    public void whenNoSpacesBetweenNodes_shouldFind2Persons_FAIL() throws Exception { 
     xmlStreamReader = xmlif.createXMLStreamReader(xmlStreamPersonsNoSeparated, "UTF-8"); 

     int personTagsFound = countUnmarshalledPersonEntities(); 

     assertEquals(personTagsFound, 2); 
    } 

    /** 
    * I don't know why, but if there's at least one whitespace character between node of the same type it won't skip. 
    * 
    * @throws Exception in a test 
    */ 
    @Test 
    public void whenWithSpacesBetweenNodes_shouldFind2Persons_SUCCESS() throws Exception { 
     xmlStreamReader = xmlif.createXMLStreamReader(xmlStreamWithPersonsWhitespaceSeparated, "UTF-8"); 

     int personTagsFound = countUnmarshalledPersonEntities(); 

     assertEquals(personTagsFound, 2); 
    } 

    /** 
    * CODE to test. 
    * 
    * @return number of unmarshalled persons (people). 
    * @throws Exception 
    */ 
    private int countUnmarshalledPersonEntities() throws Exception { 
     int personTagsFound = 0; 

     while (xmlStreamReader.hasNext()) { 
      int type = xmlStreamReader.next(); 

      if (type == XMLStreamConstants.START_ELEMENT && xmlStreamReader.getName().toString().equalsIgnoreCase("person")) { 
       personUnmarshaller.unmarshal(xmlStreamReader, Person.class); 
       personTagsFound++; 
      } 
     } 

     return personTagsFound; 
    } 
} 

有沒有什麼是代碼的問題的任何想法?

謝謝。

回答

1

感謝您的附加單元測試,這真的讓更容易理解!

當您在xmlStreamReader上執行unmarshal時,只要存在屬於您的實體的標籤,XMLStreamReader就會自行調用next。所以,您的交易person標籤之後,它會調用next並指向下一個實體的第一個person標籤。通過在下一次迭代中調用xmlStreamReader.next(),可以跳過它。如果您的實體之間有空白,則不會發生這種情況,因爲在解析之後,您的閱讀器將指向空白區域。

這個修改後的代碼對我的作品,您的兩個單元測試成功:

while (xmlStreamReader.hasNext()) { 
     if (xmlStreamReader.isStartElement() && xmlStreamReader.getName().toString().equalsIgnoreCase("person")) { 
      personUnmarshaller.unmarshal(xmlStreamReader, Person.class); 
      personTagsFound++; 
     } else { 
      xmlStreamReader.next(); 
     } 
    }