爲什麼使用UTF-16LE寫入Groovy文件會產生BOM char？

您是否知道爲什麼下面的第一條和第二條條線不會生成BOM到第三行？我認爲UTF-16LE是正確的編碼名稱，編碼不會自動創建BOM到文件的開頭。爲什麼使用UTF-16LE寫入Groovy文件會產生BOM char？

new File("foo-wo-bom.txt").withPrintWriter("utf-16le") {it << "test"} 
new File("foo-bom1.txt").withPrintWriter("UnicodeLittleUnmarked") {it << "test"} 
new File("foo-bom.txt").withPrintWriter("UTF-16LE") {it << "test"}

另一個樣品

new File("foo-bom.txt").withPrintWriter("UTF-16LE") {it << "test"} 
new File("foo-bom.txt").getBytes().each {System.out.format("%02x ", it)}

打印

ff fe 74 00 65 00 73 00 74 00

和用java

 PrintWriter w = new PrintWriter("foo.txt","UTF-16LE"); 
     w.print("test"); 
     w.close(); 
     FileInputStream r = new FileInputStream("foo.txt"); 
     int c; 
     while ((c = r.read()) != -1) { 
      System.out.format("%02x ",c); 
     } 
     r.close();

打印

74 00 65 00 73 00 74 00

和Java是不會產生物料清單和用Groovy有物料清單。

來源

2015-05-29 JukkaU

歡迎來到StackOverflow。我認爲charset會不區分大小寫（它應該在Java中），但是沒有任何文檔需要確認，我只能假設'utf-16le'（小寫）告訴'withPrintWriter（）'不發出BOM ，並且「UTF-16LE」（大寫）告訴它發出BOM。這是這個例子中唯一的區別。 'UnicodeLittleUnmarked'強制BOM被跳過，'UnicodeLittle'強制BOM，但是在Groovy中'utf-16le' /'UTF16-LE'可能更模糊？ –

我對Java和PrintWriter也進行了測試，並且這些編碼都不會生成BOM。我認爲這是正確的。如果我定義爲LE或BE。無需設置BOM。如果我只使用UTF-16，Java使用Little Endian寫入文件，並且還有BOM 在常規中，似乎utf-16和UTF-16產生BOM。 – JukkaU

似乎與withPrintWriter的行爲有所不同。在groovyConsole中

File file = new File("tmp.txt") 
try { 
    String text = " " 
    String charset = "UTF-16LE" 

    file.withPrintWriter(charset) { it << text } 
    println "withPrintWriter" 
    file.getBytes().each { System.out.format("%02x ", it) } 

    PrintWriter w = new PrintWriter(file, charset) 
    w.print(text) 
    w.close() 
    println "\n\nnew PrintWriter" 
    file.getBytes().each { System.out.format("%02x ", it) } 
} finally { 
    file.delete() 
}

嘗試了這一點，它輸出

 
withPrintWriter 
ff fe 20 00 

new PrintWriter 
20 00

這是因爲調用new PrintWriter調用Java的構造，但調用withPrintWriter最後調用org.codehaus.groovy.runtime.ResourceGroovyMethods.writeUTF16BomIfRequired()，其中寫入BOM。

我不確定這種行爲差異是否是故意的。我很好奇這件事，所以我asked上的mailing list。有人應該知道設計背後的歷史。

編輯：GROOVY-7465是在上述討論之外創建的。

來源

2015-06-03 17:54:05 Keegan

我加了一些例子 – JukkaU

謝謝。這些例子有助於澄清你的問題。我編輯了我的答案。 – Keegan

爲什麼使用UTF-16LE寫入Groovy文件會產生BOM char？

回答

相關問題