2013-02-22 55 views
3

與可以執行任何操作的application/xml文件或將所有空白序列轉換爲單個空格字符的normalizedString值相反,我在這裏特別在帶有字符串值的text/xml文件的上下文中進行了詢問。爲了簡單起見,我們假設我只使用帶有UTF8編碼文件的ASCII字符。在text/xml值中編碼CR-LF換行符的正確方法是什麼?

考慮到以下兩行文本字符串我希望在XML來表示:

Hello 
World! 

這是在內存中的以下字節:

0000: 48 65 6c 6c 6f 0d 0a 57 6f 72 6c 64 21 Hello..World! 

根據RFC 2046,任何文本/ * MIME類型必須(不應該)表示使用回車符後跟換行符字符序列的換行符。有鑑於此,下面的XML片段應該是正確的:

<tag>Hello 
World!</tag> 

0000: 3c 74 61 67 3c 48 65 6c 6c 6f 0d 0a 57 6f 72 6c <tag>Hello..Worl 
0010: 64 21 3c 2f 74 61 67 3c       d!</tag> 

但我經常看到類似以下文件:

<tag><![CDATA[Hello 
World!]]></tag> 

或者,更奇怪的:

<tag>Hello&xD; 
World!</tag> 

其中& 0xD;序列之後是單個換行符:

0000: 3c 74 61 67 3c 48 65 6c 6c 6f 26 78 44 3b 0a 57 <tag>Hello&xD;.W 
0010: 6f 72 6c 64 21 3c 2f 74 61 67 3c    orld!</tag> 

我在這裏丟失了什麼?在XML字符串值中表示多行文本的正確方法是什麼,以便它可以不受干擾地從另一端出來?

回答

2

在編寫Mono和JUnit測試中的NUnit測試後,答案似乎是使用<標籤>您好&#13; \ n世界! </tag>或<標籤>您好& #xd; \ nWorld! < /標籤>如下...

Foo.cs:

using System.IO; 
using System.Text; 
using System.Xml.Serialization; 

namespace XmlStringTests 
{ 
    public class Foo 
    { 
     public string greeting; 

     public static Foo DeserializeFromXmlString (string xml) 
     { 
      Foo result; 
      using (MemoryStream memoryStream = new MemoryStream()) { 
       byte[] buffer = Encoding.UTF8.GetBytes (xml); 
       memoryStream.Write (buffer, 0, buffer.Length); 
       memoryStream.Seek (0, SeekOrigin.Begin); 
       XmlSerializer xs = new XmlSerializer (typeof(Foo)); 
       result = (Foo)xs.Deserialize (memoryStream); 
      } 
      return result; 
     } 
    } 
} 

XmlStringTests.cs:

using NUnit.Framework; 

namespace XmlStringTests 
{ 
    [TestFixture] 
    public class XmlStringTests 
    { 
     const string expected = "Hello\u000d\u000aWorld!"; 

     [Test(Description="Fails")] 
     public void Cdata() 
     { 
      const string test = "<Foo><greeting><![CDATA[Hello\u000d\u000aWorld!]]></greeting></Foo>"; 
      Foo bar = Foo.DeserializeFromXmlString (test); 
      Assert.AreEqual (expected, bar.greeting); 
     } 

     [Test(Description="Fails")] 
     public void CdataWithHash13() 
     { 
      const string test = "<Foo><greeting><![CDATA[Hello&#13;\u000aWorld!]]></greeting></Foo>"; 
      Foo bar = Foo.DeserializeFromXmlString (test); 
      Assert.AreEqual (expected, bar.greeting); 
     } 

     [Test(Description="Fails")] 
     public void CdataWithHashxD() 
     { 
      const string test = "<Foo><greeting><![CDATA[Hello&#xd;\u000aWorld!]]></greeting></Foo>"; 
      Foo bar = Foo.DeserializeFromXmlString (test); 
      Assert.AreEqual (expected, bar.greeting); 
     } 

     [Test(Description="Fails")] 
     public void Simple() 
     { 
      const string test = "<Foo><greeting>Hello\u000d\u000aWorld!</greeting></Foo>"; 
      Foo bar = Foo.DeserializeFromXmlString (test); 
      Assert.AreEqual (expected, bar.greeting); 
     } 

     [Test(Description="Passes")] 
     public void SimpleWithHash13() 
     { 
      const string test = "<Foo><greeting>Hello&#13;\u000aWorld!</greeting></Foo>"; 
      Foo bar = Foo.DeserializeFromXmlString (test); 
      Assert.AreEqual (expected, bar.greeting); 
     } 

     [Test(Description="Passes")] 
     public void SimpleWithHashxD() 
     { 
      const string test = "<Foo><greeting>Hello&#xd;\u000aWorld!</greeting></Foo>"; 
      Foo bar = Foo.DeserializeFromXmlString (test); 
      Assert.AreEqual (expected, bar.greeting); 
     } 
    } 
} 

Foo.java:

import java.io.StringReader; 
import javax.xml.bind.JAXBContext; 
import javax.xml.bind.JAXBException; 
import javax.xml.bind.Unmarshaller; 
import javax.xml.bind.annotation.XmlRootElement; 
import javax.xml.bind.annotation.XmlType; 

@XmlRootElement(name = "Foo") 
@XmlType(propOrder = { "greeting" }) 
public class Foo { 
    public String greeting; 

    public static Foo DeserializeFromXmlString(String xml) { 
     try { 
      JAXBContext context = JAXBContext.newInstance(Foo.class); 
      Unmarshaller unmarshaller = context.createUnmarshaller(); 
      Foo foo = (Foo) unmarshaller.unmarshal(new StringReader(xml)); 
      return foo; 
     } catch (JAXBException e) { 
      e.printStackTrace(); 
      return null; 
     } 
    } 
} 

XmlStringTests.java:

import static org.junit.Assert.*; 
import org.junit.Test; 


public class XmlStringTests { 
    String expected = "Hello\r\nWorld!"; 

    @Test //Fails 
    public void testCdata() 
    { 
     String test = "<Foo><greeting><![CDATA[Hello\r\nWorld!]]></greeting></Foo>"; 
     Foo bar = Foo.DeserializeFromXmlString (test); 
     assertEquals (expected, bar.greeting); 
    } 

    @Test //Fails 
    public void testCdataWithHash13() 
    { 
     String test = "<Foo><greeting><![CDATA[Hello&#13;\nWorld!]]></greeting></Foo>"; 
     Foo bar = Foo.DeserializeFromXmlString (test); 
     assertEquals (expected, bar.greeting); 
    } 

    @Test //Fails 
    public void testCdataWithHashxD() 
    { 
     String test = "<Foo><greeting><![CDATA[Hello&#xd;\nWorld!]]></greeting></Foo>"; 
     Foo bar = Foo.DeserializeFromXmlString (test); 
     assertEquals (expected, bar.greeting); 
    } 

    @Test //Fails 
    public void testSimple() 
    { 
     String test = "<Foo><greeting>Hello\r\nWorld!</greeting></Foo>"; 
     Foo bar = Foo.DeserializeFromXmlString (test); 
     assertEquals (expected, bar.greeting); 
    } 

    @Test //Passes 
    public void testSimpleWithHash13() 
    { 
     String test = "<Foo><greeting>Hello&#13;\nWorld!</greeting></Foo>"; 
     Foo bar = Foo.DeserializeFromXmlString (test); 
     assertEquals (expected, bar.greeting); 
    } 

    @Test //Passes 
    public void testSimpleWithHashxD() 
    { 
     String test = "<Foo><greeting>Hello&#xd;\nWorld!</greeting></Foo>"; 
     Foo bar = Foo.DeserializeFromXmlString (test); 
     assertEquals (expected, bar.greeting); 
    } 
} 

我希望這可以節省一些人一些時間。

2

CR(&x0D;),LF(&x0A;),CRLF或一些其他組合都是有效的。如the spec所述,所有這些都被翻譯成單個&x0A;字符。

+1

根據同一規範,CR(#d)是CDATA塊內有效的Char,所以不應該進行轉換。我將定義從CR LF輸入中獲取LF作爲被騷擾的回來。有沒有一種方法可以正確地對XML進行編碼,以便在接收端返回CR LF,還是XML剛剛斷開,並且不符合text/xml MIME類型? – AlwaysLearning 2013-02-25 13:18:52

+0

在閱讀規範時,我將其解釋爲:如果在輸入中找到以下任何一個原始代碼點序列,請將其替換爲0xd 0xa:0xd 0x85,0x85,0x2028,0xd 「除0xa或0x85之外的任何其他」。由於此替換髮生在「解析之前」(請參閱​​參考資料),因此應保留任何文字字符實體(即「&#xd')。因此,對於該示例,解析的內容應該是字節序列「0xd」而不是「0xa」。我是否正確閱讀規範?你的回答似乎表明這個替換可能會在解析**之後發生,而不是之前...... – binki 2017-07-24 14:24:11

相關問題