對於
任何比平凡短輸入其他(並且甚至可能的話),我使用golang.org/x/text/encoding/unicode
包轉換爲UTF-16(如@peterSo和@JimB指出,稍微偏離過時的UCS-2不同)。
使用這個(超過unicode/utf16
)(和golang.org/x/text/transform
包)的好處是,你得到BOM支持,或大或小端,並且您可以編碼/解碼短字符串或字節,但你也可以將此作爲過濾器io.Reader
或io.Writer
可以在處理數據時轉換數據,而不是全部預先處理(例如,對於大量數據流,您不需要一次將所有數據全部存儲在內存中)。
例如爲:
package main
import (
"bytes"
"fmt"
"io"
"io/ioutil"
"log"
"strings"
"golang.org/x/text/encoding/unicode"
"golang.org/x/text/transform"
)
const input = "Bien joué"
func main() {
// Get a `transform.Transformer` for encoding.
e := unicode.UTF16(unicode.BigEndian, unicode.IgnoreBOM)
t := e.NewEncoder()
// For decoding, allows a Byte Order Mark at the start to
// switch to corresponding Unicode decoding (UTF-8, UTF-16BE, or UTF-16LE)
// otherwise we use `e` (UTF-16BE without BOM):
t2 := unicode.BOMOverride(e.NewDecoder())
_ = t2 // we don't show/use this
// If you have a string:
str := input
outstr, n, err := transform.String(t, str)
if err != nil {
log.Fatal(err)
}
fmt.Printf("string: n=%d, bytes=%02x\n", n, []byte(outstr))
// If you have a []byte:
b := []byte(input)
outbytes, n, err := transform.Bytes(t, b)
if err != nil {
log.Fatal(err)
}
fmt.Printf("bytes: n=%d, bytes=%02x\n", n, outbytes)
// If you have an io.Reader for the input:
ir := strings.NewReader(input)
r := transform.NewReader(ir, t)
// Now just read from r as you normal would and the encoding will
// happen as you read, good for large sources to avoid pre-encoding
// everything. Here we'll just read it all in one go though which negates
// that benefit (normally avoid ioutil.ReadAll).
outbytes, err = ioutil.ReadAll(r)
if err != nil {
log.Fatal(err)
}
fmt.Printf("reader: len=%d, bytes=%02x\n", len(outbytes), outbytes)
// If you have an io.Writer for the output:
var buf bytes.Buffer
w := transform.NewWriter(&buf, t)
_, err = fmt.Fprint(w, input) // or io.Copy from an io.Reader, or whatever
if err != nil {
log.Fatal(err)
}
fmt.Printf("writer: len=%d, bytes=%02x\n", buf.Len(), buf.Bytes())
}
// Whichever of these you need you could of
// course put in a single simple function. E.g.:
// NewUTF16BEWriter returns a new writer that wraps w
// by transforming the bytes written into UTF-16-BE.
func NewUTF16BEWriter(w io.Writer) io.Writer {
e := unicode.UTF16(unicode.BigEndian, unicode.IgnoreBOM)
return transform.NewWriter(w, e.NewEncoder())
}
// ToUTFBE converts UTF8 `b` into UTF-16-BE.
func ToUTF16BE(b []byte) ([]byte, error) {
e := unicode.UTF16(unicode.BigEndian, unicode.IgnoreBOM)
out, _, err := transform.Bytes(e.NewEncoder(), b)
return out, err
}
給出:
string: n=10, bytes=004200690065006e0020006a006f007500e9
bytes: n=10, bytes=004200690065006e0020006a006f007500e9
reader: len=18, bytes=004200690065006e0020006a006f007500e9
writer: len=18, bytes=004200690065006e0020006a006f007500e9
你真的需要限制在UCS-2的輸出?你的Python例子是針對UTF-16BE的,這不是一回事。 – JimB