2011-03-16 61 views
0

我使用迭代器樣式的API來解析帶有Stax的XML流。問題XML編碼

我開發了一個小型代碼,可將大型XML文件剪切成多個文件。

然後我讀正確的流程,但寫的時候,我得到奇怪的字符(編碼的問題)

public static void main(String[] args) throws Exception 
{ 

     int offre=0; 
     int i=0,j=0; 
     String Data=""; 
     String nom="flux0.xml"; 
     XMLEventReader reader = XMLInputFactory.newInstance().createXMLEventReader(new java.io.FileInputStream("CJ.xml")); 
     FileOutputStream output = new FileOutputStream(nom); 
     XMLOutputFactory xmlof = XMLOutputFactory.newInstance(); 
     XMLEventWriter writer = xmlof.createXMLEventWriter(output); 
     XMLEventFactory eventFactory = XMLEventFactory.newInstance(); 
     while (reader.hasNext() /*&& j<3000*/) 
     { 
      XMLEvent event = (XMLEvent) reader.next(); 

      if (event.isStartElement()) 
      { 
       if (event.asStartElement().getName().getLocalPart() == "OFFER") 
       { 
        offre++; 
       } 
      } 
      if(offre==5000) 
      { 
       i++; 
       nom="flux"+i+".xml"; 
       output = new FileOutputStream(nom); 
       writer= xmlof.createXMLEventWriter(output); 


       if (event.getEventType() == event.CHARACTERS) 
       { 

        Characters characters = event.asCharacters(); 
        String texte=characters.getData(); 
        CharsetEncoder encoder = Charset.forName("UTF-8").newEncoder(); 
        Data= new String(encoder.encode(CharBuffer.wrap(texte.toCharArray())).array()); 
        writer.add(eventFactory.createCharacters(Data)); 
       } 
        else 
        { 
        writer.add(event); 
        } 
       nom="flux"+i+".xml"; 
       offre=0; 
      } 
       else 
       { 
       if (event.getEventType() == event.CHARACTERS) 
       { 
        Characters characters = event.asCharacters(); 
        String texte=characters.getData(); 
        CharsetEncoder encoder = Charset.forName("UTF-8").newEncoder(); 
        Data= new String(encoder.encode(CharBuffer.wrap(texte.toCharArray())).array()); 
        writer.add(eventFactory.createCharacters(Data)); 
       } 
        else 
        { 
        writer.add(event); 
        } 
       } 
       writer.flush(); 
     } 

回答

0

這段代碼的字符編碼是被迫的作家

String outputEncoding = "UTF-8"; 
    FileOutputStream fos = new FileOutputStream(aFile); 
    OutputStreamWriter osw = new OutputStreamWriter(fos, outputEncoding); 
+0

謝謝: 問題解決 – timo 2011-03-16 16:07:04

0

難道不是文件這個代碼塊完全沒有必要嗎?

Characters characters = event.asCharacters(); 
String texte=characters.getData(); 
CharsetEncoder encoder = Charset.forName("UTF-8").newEncoder(); 
Data= new String(encoder.encode(CharBuffer.wrap(texte.toCharArray())).array()); 
writer.add(eventFactory.createCharacters(Data)); 

爲什麼你不能像事件一樣傳遞給作者?如果您需要的文件在specififc編碼然後有一個工廠方法採取字符集爲參數:

FileOutputStream output = new FileOutputStream(nom); 
XMLOutputFactory xmlof = XMLOutputFactory.newInstance(); 
XMLEventWriter writer = xmlof.createXMLEventWriter(output, "utf-8"); 
+0

我試試這個: XMLEventWriter的作家= xmlof.createXMLEventWriter(輸出,「UTF -8" ); 這是一個小的提取結果: – timo 2011-03-16 15:52:41