2015-05-31 26 views
1

我想在我的Go程序python翻譯成unicode字符串轉換爲UCS-2HEX字符串。字符串UCS-2

在Python中,這是相當簡單:

u"Bien joué".encode('utf-16-be').encode('hex') 
-> 004200690065006e0020006a006f007500e9 

我在Go一個初學者,我發現最簡單的方法是:

package main 

import (
    "fmt" 
    "strings" 
) 

func main() { 
    str := "Bien joué" 
    fmt.Printf("str: %s\n", str) 

    ucs2HexArray := []rune(str) 
    s := fmt.Sprintf("%U", ucs2HexArray) 
    a := strings.Replace(s, "U+", "", -1) 
    b := strings.Replace(a, "[", "", -1) 
    c := strings.Replace(b, "]", "", -1) 
    d := strings.Replace(c, " ", "", -1) 
    fmt.Printf("->: %s", d) 
} 

str: Bien joué 
->: 004200690065006E0020006A006F007500E9 
Program exited. 

我真的認爲這顯然不是有效的。我怎麼能改善它?

謝謝

+3

你真的需要限制在UCS-2的輸出?你的Python例子是針對UTF-16BE的,這不是一回事。 – JimB

回答

2

使此轉換功能,然後您可以輕鬆地改善轉換算法在未來。例如,

package main 

import (
    "fmt" 
    "strings" 
    "unicode/utf16" 
) 

func hexUTF16FromString(s string) string { 
    hex := fmt.Sprintf("%04x", utf16.Encode([]rune(s))) 
    return strings.Replace(hex[1:len(hex)-1], " ", "", -1) 
} 

func main() { 
    str := "Bien joué" 
    fmt.Println(str) 
    hex := hexUTF16FromString(str) 
    fmt.Println(hex) 
} 

輸出:

Bien joué 
004200690065006e0020006a006f007500e9 

注:

你說 「的unicode字符串轉換爲UCS-2串」,但你的Python示例使用UTF-16:

u"Bien joué".encode('utf-16-be').encode('hex') 

The Unicode Consortium

UTF-16 FAQ

Q: What is the difference between UCS-2 and UTF-16?

答:UCS-2是過時的術語,其指的是一個Unicode 實施最多爲Unicode 1.1,前替代代碼點和 UTF-16加入到標準的版本2.0 。現在應避免使用此術語 。

由於 都使用完全相同的16位代碼單元表示,所以UCS-2沒有描述與UTF-16不同的數據格式。但是,UCS-2不會解釋代理代碼點,因此不能用於符合表示補充字符的 。

有時在過去,一個實現標記爲「UCS-2」爲 ,表示它不支持補充字符,而不是 將替代碼點對解釋爲字符。這種 實現將不處理字符屬性的處理, 代碼點邊界,校對等等的補充字符。

2
對於

任何比平凡短輸入其他(並且甚至可能的話),我使用golang.org/x/text/encoding/unicode包轉換爲UTF-16(如@peterSo和@JimB指出,稍微偏離過時的UCS-2不同)。

使用這個(超過unicode/utf16)(和golang.org/x/text/transform包)的好處是,你得到BOM支持,或大或小端,並且您可以編碼/解碼短字符串或字節,但你也可以將此作爲過濾器io.Readerio.Writer可以在處理數據時轉換數據,而不是全部預先處理(例如,對於大量數據流,您不需要一次將所有數據全部存儲在內存中)。

例如爲:

package main 

import (
    "bytes" 
    "fmt" 
    "io" 
    "io/ioutil" 
    "log" 
    "strings" 

    "golang.org/x/text/encoding/unicode" 
    "golang.org/x/text/transform" 
) 

const input = "Bien joué" 

func main() { 
    // Get a `transform.Transformer` for encoding. 
    e := unicode.UTF16(unicode.BigEndian, unicode.IgnoreBOM) 
    t := e.NewEncoder() 
    // For decoding, allows a Byte Order Mark at the start to 
    // switch to corresponding Unicode decoding (UTF-8, UTF-16BE, or UTF-16LE) 
    // otherwise we use `e` (UTF-16BE without BOM): 
    t2 := unicode.BOMOverride(e.NewDecoder()) 
    _ = t2 // we don't show/use this 

    // If you have a string: 
    str := input 
    outstr, n, err := transform.String(t, str) 
    if err != nil { 
     log.Fatal(err) 
    } 
    fmt.Printf("string: n=%d, bytes=%02x\n", n, []byte(outstr)) 

    // If you have a []byte: 
    b := []byte(input) 
    outbytes, n, err := transform.Bytes(t, b) 
    if err != nil { 
     log.Fatal(err) 
    } 
    fmt.Printf("bytes: n=%d, bytes=%02x\n", n, outbytes) 

    // If you have an io.Reader for the input: 
    ir := strings.NewReader(input) 
    r := transform.NewReader(ir, t) 
    // Now just read from r as you normal would and the encoding will 
    // happen as you read, good for large sources to avoid pre-encoding 
    // everything. Here we'll just read it all in one go though which negates 
    // that benefit (normally avoid ioutil.ReadAll). 
    outbytes, err = ioutil.ReadAll(r) 
    if err != nil { 
     log.Fatal(err) 
    } 
    fmt.Printf("reader: len=%d, bytes=%02x\n", len(outbytes), outbytes) 

    // If you have an io.Writer for the output: 
    var buf bytes.Buffer 
    w := transform.NewWriter(&buf, t) 
    _, err = fmt.Fprint(w, input) // or io.Copy from an io.Reader, or whatever 
    if err != nil { 
     log.Fatal(err) 
    } 
    fmt.Printf("writer: len=%d, bytes=%02x\n", buf.Len(), buf.Bytes()) 
} 

// Whichever of these you need you could of 
// course put in a single simple function. E.g.: 

// NewUTF16BEWriter returns a new writer that wraps w 
// by transforming the bytes written into UTF-16-BE. 
func NewUTF16BEWriter(w io.Writer) io.Writer { 
    e := unicode.UTF16(unicode.BigEndian, unicode.IgnoreBOM) 
    return transform.NewWriter(w, e.NewEncoder()) 
} 

// ToUTFBE converts UTF8 `b` into UTF-16-BE. 
func ToUTF16BE(b []byte) ([]byte, error) { 
    e := unicode.UTF16(unicode.BigEndian, unicode.IgnoreBOM) 
    out, _, err := transform.Bytes(e.NewEncoder(), b) 
    return out, err 
} 

給出:

string: n=10, bytes=004200690065006e0020006a006f007500e9 
bytes: n=10, bytes=004200690065006e0020006a006f007500e9 
reader: len=18, bytes=004200690065006e0020006a006f007500e9 
writer: len=18, bytes=004200690065006e0020006a006f007500e9 
相關問題