2013-10-29 49 views
2

我正在使用Cython與接受UCS2格式(wchar數組)的unicode字符串的外部C API接口。 (我理解的UCS2面對面的人UTF-16的侷限性,但它是一個第三方API)。Cython:將unicode字符串轉換爲wchar數組

  • 用Cython版本:0.15.1
  • Python版本:2.6(窄Unicode生成)
  • OS:FreeBSD的

的用Cython用戶指南廣泛地與轉換UNICODE到字節串的交易,但我無法弄清楚如何轉換到一個16位數組。我意識到我首先需要編碼爲UTF-16(並且我現在假設超出BMP的代碼點不會發生)。接下來我該做什麼?請幫忙。

在此先感謝。

回答

0

這是的Python 3非常可能的,一個解決方案是這樣的:

# cython: language_level=3 

from libc.stddef cimport wchar_t 

cdef extern from "Python.h": 
    wchar_t* PyUnicode_AsWideCharString(object, Py_ssize_t *) 

cdef extern from "wchar.h": 
    int wprintf(const wchar_t *, ...) 

my_string = u"Foobar\n" 
cdef Py_ssize_t length 
cdef wchar_t *my_wchars = PyUnicode_AsWideCharString(my_string, &length) 

wprintf(my_wchars) 
print("Length:", <long>length) 
print("Null End:", my_wchars[7] == 0) 

一個不太好的Python 2中方法如下,但它可能是在處理不確定的或損壞的行爲,所以我不太容易:

# cython: language_level=2 

from cpython.ref cimport PyObject 
from libc.stddef cimport wchar_t 
from libc.stdio cimport fflush, stdout 
from libc.stdlib cimport malloc, free 

cdef extern from "Python.h": 
    ctypedef PyObject PyUnicodeObject 
    Py_ssize_t PyUnicode_AsWideChar(PyUnicodeObject *o, wchar_t *w, Py_ssize_t size) 

my_string = u"Foobar\n" 
cdef Py_ssize_t length = len(my_string.encode("UTF-16")) // 2 # cheating 
cdef wchar_t *my_wchars = <wchar_t *>malloc(length * sizeof(wchar_t)) 
cdef Py_ssize_t number_written = PyUnicode_AsWideChar(<PyUnicodeObject *>my_string, my_wchars, length) 

# wprintf breaks things for some reason 
print [my_wchars[i] for i in range(length)] 
print "Length:", <long>length 
print "Number Written:", <long>number_written 
print "Null End:", my_wchars[7] == 0 

free(my_wchars) 
相關問題