Cython：將unicode字符串轉換爲wchar數組

我正在使用Cython與接受UCS2格式（wchar數組）的unicode字符串的外部C API接口。（我理解的UCS2面對面的人UTF-16的侷限性，但它是一個第三方API）。Cython：將unicode字符串轉換爲wchar數組

用Cython版本：0.15.1
Python版本：2.6（窄Unicode生成）
OS：FreeBSD的

的用Cython用戶指南廣泛地與轉換UNICODE到字節串的交易，但我無法弄清楚如何轉換到一個16位數組。我意識到我首先需要編碼爲UTF-16（並且我現在假設超出BMP的代碼點不會發生）。接下來我該做什麼？請幫忙。

在此先感謝。

2013-10-29 Srikanth S.

這是的Python 3非常可能的，一個解決方案是這樣的：

# cython: language_level=3 

from libc.stddef cimport wchar_t 

cdef extern from "Python.h": 
    wchar_t* PyUnicode_AsWideCharString(object, Py_ssize_t *) 

cdef extern from "wchar.h": 
    int wprintf(const wchar_t *, ...) 

my_string = u"Foobar\n" 
cdef Py_ssize_t length 
cdef wchar_t *my_wchars = PyUnicode_AsWideCharString(my_string, &length) 

wprintf(my_wchars) 
print("Length:", <long>length) 
print("Null End:", my_wchars[7] == 0)

一個不太好的Python 2中方法如下，但它可能是在處理不確定的或損壞的行爲，所以我不太容易：

# cython: language_level=2 

from cpython.ref cimport PyObject 
from libc.stddef cimport wchar_t 
from libc.stdio cimport fflush, stdout 
from libc.stdlib cimport malloc, free 

cdef extern from "Python.h": 
    ctypedef PyObject PyUnicodeObject 
    Py_ssize_t PyUnicode_AsWideChar(PyUnicodeObject *o, wchar_t *w, Py_ssize_t size) 

my_string = u"Foobar\n" 
cdef Py_ssize_t length = len(my_string.encode("UTF-16")) // 2 # cheating 
cdef wchar_t *my_wchars = <wchar_t *>malloc(length * sizeof(wchar_t)) 
cdef Py_ssize_t number_written = PyUnicode_AsWideChar(<PyUnicodeObject *>my_string, my_wchars, length) 

# wprintf breaks things for some reason 
print [my_wchars[i] for i in range(length)] 
print "Length:", <long>length 
print "Number Written:", <long>number_written 
print "Null End:", my_wchars[7] == 0 

free(my_wchars)

來源

2014-01-15 21:26:43 Veedrac

Cython：將unicode字符串轉換爲wchar數組

回答

相關問題