Unmarshall期間XML字符無效

我正在使用編碼「UTF-8」將對象編組爲XML文件。它成功生成文件。但是，當我嘗試解組回來，有一個錯誤：Unmarshall期間XML字符無效

An invalid XML character (Unicode: 0x{2}) was found in the value of attribute "{1}" and element is "0"

的字符是0x1A的或\ u001a，這是XML的UTF-8有效的，但非法的。 JAXB中的Marshaller允許將此字符寫入XML文件，但Unmarshaller無法解析它。我試圖使用另一種編碼（UTF-16，ASCII等），但仍然錯誤。

常見的解決方案是在XML解析之前刪除/替換此無效字符。但是如果我們需要這個角色，那麼在解組之後如何獲得原始角色呢？

在尋找此解決方案時，我想在取消編組之前用替代字符（例如dot =「。」）替換無效字符。

我創造了這個類：

public class InvalidXMLCharacterFilterReader extends FilterReader { 

    public static final char substitute = '.'; 

    public InvalidXMLCharacterFilterReader(Reader in) { 
     super(in); 
    } 

    @Override 
    public int read(char[] cbuf, int off, int len) throws IOException { 

     int read = super.read(cbuf, off, len); 

     if (read == -1) 
      return -1; 

     for (int readPos = off; readPos < off + read; readPos++) { 
      if(!isValid(cbuf[readPos])) { 
        cbuf[readPos] = substitute; 
      } 
     } 

     return readPos - off + 1; 
    } 

    public boolean isValid(char c) { 
     if((c == 0x9) 
       || (c == 0xA) 
       || (c == 0xD) 
       || ((c >= 0x20) && (c <= 0xD7FF)) 
       || ((c >= 0xE000) && (c <= 0xFFFD)) 
       || ((c >= 0x10000) && (c <= 0x10FFFF))) 
     { 
      return true; 
     } else 
      return false; 
    } 
}

那麼這就是我如何讀解組文件：

FileReader fileReader = new FileReader(this.getFile()); 
Reader reader = new InvalidXMLCharacterFilterReader(fileReader); 
Object o = (Object)um.unmarshal(reader);

不知何故，讀者不替換我想要的字符無效字符。它導致錯誤的XML數據不能被解組。我的InvalidXMLCharacterFilterReader類有什麼問題嗎？

來源

2011-04-28 oliverwood

您可以檢查XML標題，編碼後在哪個字符集中定義它？它是UTF-8嗎？ – JMelnik 2011-04-28 07:33:03

在XML標題中沒有定義字符集，只有<？xml version =「1.0」？>。但我已經把這個：'m.setProperty（Marshaller.JAXB_ENCODING，「UTF-8」）;' – oliverwood 2011-04-28 07:52:23

我覺得主要的問題是在編組期間逃脫非法字符。類似的東西被提到here，你可以試試看。

提供改變編碼爲Unicode marshaller.setProperty("jaxb.encoding", "Unicode");

來源

2011-04-28 07:41:41 JMelnik

我試圖在編組期間轉義字符「0x1a」轉換爲字符引用「$＃x1a ;」並將編碼更改爲「Unicode」，但解編過程中仍出現錯誤：_Character參考「＆＃x1A」是無效的XML字符._ – oliverwood 2011-05-03 10:21:10

Unicode字符U + 001A是illegal in XML 1.0：

用來表示它不會在這種情況下，重要的編碼，它只是不允許出現在XML內容。

XML 1.1 allows some of the restricted characters（包括U + 001A）被包括在內，但它們必須存在作爲數字字符引用（）

維基百科有a nice summary of the situation。

來源

2011-04-28 07:55:41

您知道如何將Marshaller屬性設置爲具有XML 1.1標頭嗎？我試過這個，但它不起作用：'m.setProperty（「com.sun.xml.bind.xmlHeaders」，「<？xml version = \」1.1 \「？>」）;' – oliverwood 2011-04-28 08:20:22

它看起來像JAXB doesn目前還不支持XML 1.1：http://java.net/jira/browse/JAXB-422 – 2011-04-28 08:23:23

我不認爲冒犯角色是0x02，請注意{2}和{1}周圍的花括號，這看起來更多像錯誤消息中的佔位符沒有被替換。 – 2011-04-28 09:10:11

Unmarshall期間XML字符無效

回答

相關問題