字符串UCS-2

我想在我的Go程序python翻譯成unicode字符串轉換爲UCS-2HEX字符串。字符串UCS-2

在Python中，這是相當簡單：

u"Bien joué".encode('utf-16-be').encode('hex') 
-> 004200690065006e0020006a006f007500e9

我在Go一個初學者，我發現最簡單的方法是：

package main 

import (
    "fmt" 
    "strings" 
) 

func main() { 
    str := "Bien joué" 
    fmt.Printf("str: %s\n", str) 

    ucs2HexArray := []rune(str) 
    s := fmt.Sprintf("%U", ucs2HexArray) 
    a := strings.Replace(s, "U+", "", -1) 
    b := strings.Replace(a, "[", "", -1) 
    c := strings.Replace(b, "]", "", -1) 
    d := strings.Replace(c, " ", "", -1) 
    fmt.Printf("->: %s", d) 
} 

str: Bien joué 
->: 004200690065006E0020006A006F007500E9 
Program exited.

我真的認爲這顯然不是有效的。我怎麼能改善它？

謝謝

來源

2015-05-31 Julio

你真的需要限制在UCS-2的輸出？你的Python例子是針對UTF-16BE的，這不是一回事。 – JimB

使此轉換功能，然後您可以輕鬆地改善轉換算法在未來。例如，

package main 

import (
    "fmt" 
    "strings" 
    "unicode/utf16" 
) 

func hexUTF16FromString(s string) string { 
    hex := fmt.Sprintf("%04x", utf16.Encode([]rune(s))) 
    return strings.Replace(hex[1:len(hex)-1], " ", "", -1) 
} 

func main() { 
    str := "Bien joué" 
    fmt.Println(str) 
    hex := hexUTF16FromString(str) 
    fmt.Println(hex) 
}

輸出：

Bien joué 
004200690065006e0020006a006f007500e9

注：

你說「的unicode字符串轉換爲UCS-2串」，但你的Python示例使用UTF-16：

u"Bien joué".encode('utf-16-be').encode('hex')

The Unicode Consortium

UTF-16 FAQ

Q: What is the difference between UCS-2 and UTF-16?

答：UCS-2是過時的術語，其指的是一個Unicode 實施最多爲Unicode 1.1，前替代代碼點和 UTF-16加入到標準的版本2.0 。現在應避免使用此術語。

由於都使用完全相同的16位代碼單元表示，所以UCS-2沒有描述與UTF-16不同的數據格式。但是，UCS-2不會解釋代理代碼點，因此不能用於符合表示補充字符的。

有時在過去，一個實現標記爲「UCS-2」爲，表示它不支持補充字符，而不是將替代碼點對解釋爲字符。這種實現將不處理字符屬性的處理，代碼點邊界，校對等等的補充字符。

來源

2015-05-31 13:24:32 peterSO

標準庫爲此具有內置utf16.Encode()（https://golang.org/pkg/unicode/utf16/#Encode）功能。

來源

2015-05-31 12:30:44 Evan

對於

任何比平凡短輸入其他（並且甚至可能的話），我使用golang.org/x/text/encoding/unicode包轉換爲UTF-16（如@peterSo和@JimB指出，稍微偏離過時的UCS-2不同）。

使用這個（超過unicode/utf16）（和golang.org/x/text/transform包）的好處是，你得到BOM支持，或大或小端，並且您可以編碼/解碼短字符串或字節，但你也可以將此作爲過濾器io.Reader或io.Writer可以在處理數據時轉換數據，而不是全部預先處理（例如，對於大量數據流，您不需要一次將所有數據全部存儲在內存中）。

例如爲：

package main 

import (
    "bytes" 
    "fmt" 
    "io" 
    "io/ioutil" 
    "log" 
    "strings" 

    "golang.org/x/text/encoding/unicode" 
    "golang.org/x/text/transform" 
) 

const input = "Bien joué" 

func main() { 
    // Get a `transform.Transformer` for encoding. 
    e := unicode.UTF16(unicode.BigEndian, unicode.IgnoreBOM) 
    t := e.NewEncoder() 
    // For decoding, allows a Byte Order Mark at the start to 
    // switch to corresponding Unicode decoding (UTF-8, UTF-16BE, or UTF-16LE) 
    // otherwise we use `e` (UTF-16BE without BOM): 
    t2 := unicode.BOMOverride(e.NewDecoder()) 
    _ = t2 // we don't show/use this 

    // If you have a string: 
    str := input 
    outstr, n, err := transform.String(t, str) 
    if err != nil { 
     log.Fatal(err) 
    } 
    fmt.Printf("string: n=%d, bytes=%02x\n", n, []byte(outstr)) 

    // If you have a []byte: 
    b := []byte(input) 
    outbytes, n, err := transform.Bytes(t, b) 
    if err != nil { 
     log.Fatal(err) 
    } 
    fmt.Printf("bytes: n=%d, bytes=%02x\n", n, outbytes) 

    // If you have an io.Reader for the input: 
    ir := strings.NewReader(input) 
    r := transform.NewReader(ir, t) 
    // Now just read from r as you normal would and the encoding will 
    // happen as you read, good for large sources to avoid pre-encoding 
    // everything. Here we'll just read it all in one go though which negates 
    // that benefit (normally avoid ioutil.ReadAll). 
    outbytes, err = ioutil.ReadAll(r) 
    if err != nil { 
     log.Fatal(err) 
    } 
    fmt.Printf("reader: len=%d, bytes=%02x\n", len(outbytes), outbytes) 

    // If you have an io.Writer for the output: 
    var buf bytes.Buffer 
    w := transform.NewWriter(&buf, t) 
    _, err = fmt.Fprint(w, input) // or io.Copy from an io.Reader, or whatever 
    if err != nil { 
     log.Fatal(err) 
    } 
    fmt.Printf("writer: len=%d, bytes=%02x\n", buf.Len(), buf.Bytes()) 
} 

// Whichever of these you need you could of 
// course put in a single simple function. E.g.: 

// NewUTF16BEWriter returns a new writer that wraps w 
// by transforming the bytes written into UTF-16-BE. 
func NewUTF16BEWriter(w io.Writer) io.Writer { 
    e := unicode.UTF16(unicode.BigEndian, unicode.IgnoreBOM) 
    return transform.NewWriter(w, e.NewEncoder()) 
} 

// ToUTFBE converts UTF8 `b` into UTF-16-BE. 
func ToUTF16BE(b []byte) ([]byte, error) { 
    e := unicode.UTF16(unicode.BigEndian, unicode.IgnoreBOM) 
    out, _, err := transform.Bytes(e.NewEncoder(), b) 
    return out, err 
}

給出：

string: n=10, bytes=004200690065006e0020006a006f007500e9 
bytes: n=10, bytes=004200690065006e0020006a006f007500e9 
reader: len=18, bytes=004200690065006e0020006a006f007500e9 
writer: len=18, bytes=004200690065006e0020006a006f007500e9

來源

2015-05-31 15:28:48

回答

相關問題