2014-04-22 48 views
1

我正在處理twitter文本c#庫,並且Twitter已經在其一致性測試中添加了雙字unicode字符測試。YamlDotNet沒有正確地反序列化雙字unicode字符

https://github.com/twitter/twitter-text-conformance/blob/master/validate.yml

這裏有一個NUnit的測試方法對上述文件運行。

[Test] 
    public void TestDoubleWordUnicodeYamlRetrieval() 
    { 
     var yamlFile = "validate.yml"; 
     Assert.IsTrue(File.Exists(conformanceDir + yamlFile), "Yaml file " + conformanceDir + yamlFile + " does not exist."); 

     var stream = new StreamReader(Path.Combine(conformanceDir, yamlFile)); 
     var yaml = new YamlStream(); 
     yaml.Load(stream); 

     var root = yaml.Documents[0].RootNode as YamlMappingNode; 
     var testNode = new YamlScalarNode("tests"); 
     Assert.IsTrue(root.Children.ContainsKey(testNode), "Document is missing test node."); 
     var tests = root.Children[testNode] as YamlMappingNode; 
     Assert.IsNotNull(tests, "Test node is not YamlMappingNode"); 

     var typeNode = new YamlScalarNode("lengths"); 
     Assert.IsTrue(tests.Children.ContainsKey(typeNode), "Test type lengths not found in tests."); 
     var typeTests = tests.Children[typeNode] as YamlSequenceNode; 
     Assert.IsNotNull(typeTests, "lengths tests are not YamlSequenceNode"); 

     var list = new List<dynamic>(); 
     var count = 0; 
     foreach (YamlMappingNode item in typeTests) 
     { 
      var text = ConvertNode<string>(item.Children.Single(x => x.Key.ToString() == "text").Value) as string; 
      var description = ConvertNode<string>(item.Children.Single(x => x.Key.ToString() == "description").Value) as string; 
      Assert.DoesNotThrow(() => {text.Normalize(NormalizationForm.FormC);}, String.Format("Yaml couldn't parse a double word unicode string at test {0} - {1}.", count, description)); 
      count++; 
     } 
    } 

這是產生的誤差: Vocus.TwitterText.Tests.ConformanceTest.TestDoubleWordUnicodeYamlRetrieval: YAML未能在試驗5解析一個雙字unicode字符串 - 計數基本多語種平面之外的unicode字符(雙字)。 意外的異常信息:System.ArgumentException

回答

0

我不認爲這是是YAML解析器,你可以試試:

using (var stream = new StreamReader(path, Encoding.UTF8)) 
{ 
    var yaml = new YamlStream(); 
    yaml.Load(stream); 
    //Do the rest of your code 
} 
+0

對不起在回答這麼晚了,但是這並沒有幫助。 有問題的特定線實際上不是UTF8字符,但unicoded字符表示: 文本:「\ U00010000 \ U0010ffff」 使用流讀取器時,輸出的文件爲一個字符串,字符是正確的。使用yaml檢索節點時,輸出爲\ 0。 –