2012-10-22 85 views
0

對不起,發表了雙後。但是,我先前的職位是基於Flex的:Java XML解析器錯誤無效字符Unicode 0x1A從Word複製/粘貼時

Flex TextArea - copy/paste from Word - Invalid unicode characters on xml parsing

但現在我張貼這在Java端。

的問題是:

我們有一個電子郵件功能(我們的應用程序的一部分),我們創建一個XML字符串&把它放在隊列中。另一個應用程序撿起它,解析XML &發送電子郵件。

我們得到一個XML解析器異常,當電子郵件正文(<BODY>....</BODY)是複製/粘貼字:

Invalid character in attribute value BODY (Unicode: 0x1A) 

當我們使用Java作爲好,我試圖刪除使用String無效字符:

body = body.replaceAll("‘", ""); 
body = body.replaceAll("’", ""); 

//地帶無效字符

public String stripNonValidXMLCharacters(String in) { 
     StringBuffer out = new StringBuffer(); // Used to hold the output. 
     char current; // Used to reference the current character. 

     if (in == null || ("".equals(in))) { 
      return ""; // vacancy test. 
     } 
     for (int i = 0; i < in.length(); i++) { 
      //NOTE: No IndexOutOfBoundsException caught here; it should not happen. 
      current = in.charAt(i); 
      if ((current == 0x9) 
        || (current == 0xA) 
        || (current == 0xD) 
        || ((current >= 0x20) && (current <= 0xD7FF)) 
        || ((current >= 0xE000) && (current <= 0xFFFD)) 
        || ((current >= 0x10000) && (current <= 0x10FFFF))) 
       out.append(current); 
     } 
     return out.toString(); 
    } 

//小號行程再次

private String stripNonValidXMLCharacter(String in) {  
     if (in == null || ("".equals(in))) { 
      return null; 
     } 
     StringBuffer out = new StringBuffer(in); 
     for (int i = 0; i < out.length(); i++) { 
      if (out.charAt(i) == 0x1a) { 
       out.setCharAt(i, '-'); 
      } 
     } 
     return out.toString(); 
    } 

//如果任何

emailText = emailText.replaceAll("[\\u0000-\\u0008\\u000B\\u000C" 
         + "\\u000E-\\u001F" 
         + "\\uD800-\\uDFFF\\uFFFE\\uFFFF\\u00C5\\u00D4\\u00EC" 
         + "\\u00A8\\u00F4\\u00B4\\u00CC\\u2211]", " "); 
      emailText = emailText.replaceAll("[\\x00-\\x1F]", ""); 
      emailText = emailText.replaceAll(
            "[\\x00-\\x08\\x0b\\x0c\\x0e-\\x1f]", ""); 
      emailText = emailText.replaceAll("\\p{C}", ""); 

更換特殊字符,但是他們仍然不工作。此外,XML字符串以下列字符開頭:

<?xml version="1.0" encoding="UTF-8"?> 
        <EMAILS xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNameSpaceSchemaLocation=".\\SMTPSchema.xsd\"> 

我認爲在Word文檔中有多個選項卡時會發生該問題。像例如。

Text......text 
<newLine> 
<tab><tab><tab> text...text 
<newLine> 

生成的XML字符串是:

<?xml version="1.0" encoding="UTF-8"?> <EMAILS xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNameSpaceSchemaLocation=".\SMTPSchema.xsd"> <EMAIL SOURCE="[email protected]" DEST="[email protected]" CC="" BCC="[email protected]" SUBJECT="test 61" BODY="As such there was no mechanism constructed to migrate the enrollment user base to Data Collection or to keep security attributes for common users in sync between the two systems. The purpose of this document is to outline two strategies for bring the user base between the two applications into sync.? It still is the same. ** Please note: This e-mail message was sent from a notification-only address that cannot accept incoming e-mail. Please do not reply to this message."/> </EMAILS> 

請注意,那麼 「?」是Word文檔中有多個選項卡的位置。希望我的問題是清楚的&有人能解決這個問題

感謝

回答

0

無效(隱藏)字符來自UI(Flex TextArea)。所以不得不在UI中處理它,以便它不會傳遞給Java。處理&使用Flex textArea中的chagingHandler刪除它以限制字符。

0

幫您是否嘗試過使用的XML庫,如TagSoup/JSoup/JTidy來淨化你的XML?

+0

Tim,我只需要生成XML字符串。另一個應用程序(我們無法控制它)接受字符串並解析它。我可以使用這些技術中的任何一種來生成字符串嗎? – Harry

+0

我想把你的XML加入到其中一個庫中,並使用清理後的結果可以解決你的問題。 –

+0

使用XML庫首先構建XML可能是一個更好的主意。 –

相關問題