Java從UTF-16LE解析XML字符串

我想解析嵌入在文件中的UTF-16LE XML字符串。我能夠將實際的字符串讀入一個String對象，並且可以在監視窗口中查看XML，並且它看起來很好。問題是，當我嘗試解析它時，異常不斷拋出。我試圖在getBytes行和InputStreamReader構造函數中指定UTF-16和UTF-16LE，但它仍會拋出異常。Java從UTF-16LE解析XML字符串

DocumentBuilderFactory builderFactory = DocumentBuilderFactory.newInstance(); 
DocumentBuilder builder = null; 

builder = builderFactory.newDocumentBuilder();  
Document document = null; 
byte[] bytes = xmlString.getBytes(); 
ByteArrayInputStream inputStream = new ByteArrayInputStream(bytes); 
InputSource is = new InputSource(new InputStreamReader(inputStream)); 
document = builder.parse(is); // throws SAXParseException

編輯：這是使用Android。此外，這是我在STACK TRACE頂部的例外情況：

12-18 13：51：12.978：W/System.err（5784）：org.xml.sax.SAXParseException：name expected（position ：START_TAG @ 1：2 in [email protected]） 12-18 13：51：12.978：W/System.err（5784）：at org.apache.harmony.xml.parsers.DocumentBuilderImpl.parse（DocumentBuilderImpl在javax.xml.parsers.DocumentBuilder.parse（DocumentBuilder.java:107）

來源

2012-12-17 rplankenhorn

什麼是wrmHeaderXml？一個字符串，一個對象還是waht？看來你是從字節轉換爲字符，然後再從字符轉換爲字節。爲什麼？如果你已經得到了這些字節，只要將它提供給InputSource（InputStream） – leonbloy

我想這是一個字符串。如果你有一個String對象（你聲明你可以在控制檯中查看它）比內部編碼沒有關係，因爲它是一個Java String – Raffaele

這是我結束了。 up做：

DocumentBuilderFactory builderFactory = DocumentBuilderFactory.newInstance(); 
DocumentBuilder builder = null; 

builder = builderFactory.newDocumentBuilder();  
Document document = null; 
byte[] bytes = Charset.forName("UTF-16LE").encode(xmlString).array(); 
InputStream inputStream = new ByteArrayInputStream(bytes); 
document = builder.parse(inputStream);

來源：How does one create an InputStream from a String?

來源

2012-12-17 19:18:36 rplankenhorn

對String進行編碼的目的是什麼？ – Raffaele

它我只是調用xmlString.getBytes並將其傳遞到ByteArrayInputStream，然後它會拋出SAXParseException。 – rplankenhorn

但是，爲什麼你需要從字符串中提取字節呢？只要傳遞['StringReader']（http://docs.oracle.com/javase/6/docs/api/java/io/StringReader.html）到'InputSource' ctor – Raffaele

在同一個程序中，不需要在字符串和字節之間來回轉換。它就像一樣容易：

String xml = "<root><tag>Hello World!</tag></root>"; 

Document dom = DocumentBuilderFactory.newInstance() 
    .newDocumentBuilder().parse(new InputSource(new StringReader(xml)));

來源

2012-12-17 22:37:25 Raffaele

這會在分析行上拋出一個SAXParseException異常。 – rplankenhorn

不需要粗魯。當我嘗試使用上面的解析行和我解析的XML時，會引發SAXParseException。我發佈了上面的STACK TRACE的頂部。如果我只調用xmlString.getBytes（）並查看二進制數據，那麼它是UTF-16LE編碼。前兩個字節是0xFF 0xFE，它告訴我它是小端的UTF-16編碼。 – rplankenhorn

@rplankenhorn聽起來像你的'xmlString'實際上包含了BOM作爲它的第一個字符。如果你將這個第一個字符從字符串中剝離出來，然後從結果中創建一個StringReader，那麼它應該從沒有來回字節的字符串中解析出來。 –

Java從UTF-16LE解析XML字符串

回答

相關問題