用cython聲明一個numpy數組奇怪地會產生很多開銷

我正在用Cython重寫一些Python代碼。用cython聲明一個numpy數組奇怪地會產生很多開銷

繼建議in the documentation我開始與優化用Cython定義代替我的蟒蛇陣列。

特別地，以下被認爲是聲明一個numpy的陣列的「最好」的方式：

# cython: profile=True 
# cython: boundscheck=False 
# cython: wraparound=False 

import numpy as np 
cimport numpy as np 

cpdef test(): 

    cdef np.ndarray[np.int_t, ndim=1] seeds_idx = np.empty(10, dtype=np.int) 

    pass

然而，通過仿形經由cython -a my_file.pyx上面的代碼生成的HTML文件顯示以下內容：

+10:  cdef np.ndarray[np.int_t, ndim=1] seeds_idx = np.empty(10, dtype=np.int) 
    __pyx_t_1 = __Pyx_GetModuleGlobalName(__pyx_n_s_np); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 10, __pyx_L1_error) 
    __Pyx_GOTREF(__pyx_t_1); 
    __pyx_t_2 = __Pyx_PyObject_GetAttrStr(__pyx_t_1, __pyx_n_s_empty); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 10, __pyx_L1_error) 
    __Pyx_GOTREF(__pyx_t_2); 
    __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0; 
    __pyx_t_1 = PyDict_New(); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 10, __pyx_L1_error) 
    __Pyx_GOTREF(__pyx_t_1); 
    __pyx_t_3 = __Pyx_GetModuleGlobalName(__pyx_n_s_np); if (unlikely(!__pyx_t_3)) __PYX_ERR(0, 10, __pyx_L1_error) 
    __Pyx_GOTREF(__pyx_t_3); 
    __pyx_t_4 = __Pyx_PyObject_GetAttrStr(__pyx_t_3, __pyx_n_s_int); if (unlikely(!__pyx_t_4)) __PYX_ERR(0, 10, __pyx_L1_error) 
    __Pyx_GOTREF(__pyx_t_4); 
    __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0; 
    if (PyDict_SetItem(__pyx_t_1, __pyx_n_s_dtype, __pyx_t_4) < 0) __PYX_ERR(0, 10, __pyx_L1_error) 
    __Pyx_DECREF(__pyx_t_4); __pyx_t_4 = 0; 
    __pyx_t_4 = __Pyx_PyObject_Call(__pyx_t_2, __pyx_tuple_, __pyx_t_1); if (unlikely(!__pyx_t_4)) __PYX_ERR(0, 10, __pyx_L1_error) 
    __Pyx_GOTREF(__pyx_t_4); 
    __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0; 
    __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0; 
    if (!(likely(((__pyx_t_4) == Py_None) || likely(__Pyx_TypeTest(__pyx_t_4, __pyx_ptype_5numpy_ndarray))))) __PYX_ERR(0, 10, __pyx_L1_error) 
    __pyx_t_5 = ((PyArrayObject *)__pyx_t_4); 
    { 
    __Pyx_BufFmt_StackElem __pyx_stack[1]; 
    if (unlikely(__Pyx_GetBufferAndValidate(&__pyx_pybuffernd_seeds_idx.rcbuffer->pybuffer, (PyObject*)__pyx_t_5, &__Pyx_TypeInfo_nn___pyx_t_5numpy_int_t, PyBUF_FORMAT| PyBUF_STRIDES, 1, 0, __pyx_stack) == -1)) { 
     __pyx_v_seeds_idx = ((PyArrayObject *)Py_None); __Pyx_INCREF(Py_None); __pyx_pybuffernd_seeds_idx.rcbuffer->pybuffer.buf = NULL; 
     __PYX_ERR(0, 10, __pyx_L1_error) 
    } else {__pyx_pybuffernd_seeds_idx.diminfo[0].strides = __pyx_pybuffernd_seeds_idx.rcbuffer->pybuffer.strides[0]; __pyx_pybuffernd_seeds_idx.diminfo[0].shape = __pyx_pybuffernd_seeds_idx.rcbuffer->pybuffer.shape[0]; 
    } 
    } 
    __pyx_t_5 = 0; 
    __pyx_v_seeds_idx = ((PyArrayObject *)__pyx_t_4); 
    __pyx_t_4 = 0; 
/* … */ 
    __pyx_tuple_ = PyTuple_Pack(1, __pyx_int_10); if (unlikely(!__pyx_tuple_)) __PYX_ERR(0, 10, __pyx_L1_error) 
    __Pyx_GOTREF(__pyx_tuple_); 
    __Pyx_GIVEREF(__pyx_tuple_);

這是關於Python 2.7獲得具有用Cython 0.24和numpy的1.10.4。

在另一方面，很簡單的聲明seeds_idx = np.empty(10)結果：

+10:  seeds_idx = np.empty(10) 
    __pyx_t_1 = __Pyx_GetModuleGlobalName(__pyx_n_s_np); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 10, __pyx_L1_error) 
    __Pyx_GOTREF(__pyx_t_1); 
    __pyx_t_2 = __Pyx_PyObject_GetAttrStr(__pyx_t_1, __pyx_n_s_empty); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 10, __pyx_L1_error) 
    __Pyx_GOTREF(__pyx_t_2); 
    __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0; 
    __pyx_t_1 = __Pyx_PyObject_Call(__pyx_t_2, __pyx_tuple_, NULL); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 10, __pyx_L1_error) 
    __Pyx_GOTREF(__pyx_t_1); 
    __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0; 
    __pyx_v_seeds_idx = __pyx_t_1; 
    __pyx_t_1 = 0; 
/* … */ 
    __pyx_tuple_ = PyTuple_Pack(1, __pyx_int_10); if (unlikely(!__pyx_tuple_)) __PYX_ERR(0, 10, __pyx_L1_error) 
    __Pyx_GOTREF(__pyx_tuple_); 
    __Pyx_GIVEREF(__pyx_tuple_);

這是怎麼回事錯在這裏（如果有的話）？謝謝！

來源

2016-05-24 Gioker

沒有什麼錯，numpy數組是複雜（但非常高效）的數據結構。你可以嘗試使用[typed memoryviews]（http://docs.cython.org/src/userguide/memoryviews.html），它們通常更快，並且可以很容易地轉換爲numpy數組。 –

另一點值得注意的是，在分配數組中有__is__開銷。使用快速但分配它的數組可能會慢一點，所以儘量不要做不必要的事情。 – DavidW

我明白了，所以在聲明過程中有一個小的開銷，但是分配/訪問/等等要快得多。 – Gioker

正如評論所述，這裏沒有什麼不妥，所以不必擔心。另外，請記住，您正在檢查爲簡單分配生成的代碼，任何差異都不會影響性能。

雖然是小勘誤，但在第二種情況下seeds_idx = np.empty(10)應改爲seeds_idx = np.empty(10, dtype=np.int)以匹配第一個。

如果添加了，那麼這是用於存儲函數調用（np.empty）的參數創建字典還補充說：

__pyx_t_1 = PyDict_New(); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 8, __pyx_L1_error) 
__Pyx_GOTREF(__pyx_t_1);

查找的np.int：

__pyx_t_3 = __Pyx_GetModuleGlobalName(__pyx_n_s_np); if (unlikely(!__pyx_t_3)) __PYX_ERR(0, 10, __pyx_L1_error) 
__Pyx_GOTREF(__pyx_t_3); 
__pyx_t_4 = __Pyx_PyObject_GetAttrStr(__pyx_t_3, __pyx_n_s_int); if (unlikely(!__pyx_t_4)) __PYX_ERR(0, 10, __pyx_L1_error) 
__Pyx_GOTREF(__pyx_t_4); 
__Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0;

和在新創建的字典中的參數設置完成：

if (PyDict_SetItem(__pyx_t_1, __pyx_n_s_dtype, __pyx_t_4) < 0) __PYX_ERR(0, 8, __pyx_L1_error) 
__Pyx_DECREF(__pyx_t_4); __pyx_t_4 = 0;

除了這些，他們之間的唯一區別是：

if (!(likely(((__pyx_t_4) == Py_None) || likely(__Pyx_TypeTest(__pyx_t_4, __pyx_ptype_5numpy_ndarray))))) __PYX_ERR(0, 10, __pyx_L1_error) 
    __pyx_t_5 = ((PyArrayObject *)__pyx_t_4); 
    { 
    __Pyx_BufFmt_StackElem __pyx_stack[1]; 
    if (unlikely(__Pyx_GetBufferAndValidate(&__pyx_pybuffernd_seeds_idx.rcbuffer->pybuffer, (PyObject*)__pyx_t_5, &__Pyx_TypeInfo_nn___pyx_t_5numpy_int_t, PyBUF_FORMAT| PyBUF_STRIDES, 1, 0, __pyx_stack) == -1)) { 
     __pyx_v_seeds_idx = ((PyArrayObject *)Py_None); __Pyx_INCREF(Py_None); __pyx_pybuffernd_seeds_idx.rcbuffer->pybuffer.buf = NULL; 
     __PYX_ERR(0, 10, __pyx_L1_error) 
    } else {__pyx_pybuffernd_seeds_idx.diminfo[0].strides = __pyx_pybuffernd_seeds_idx.rcbuffer->pybuffer.strides[0]; __pyx_pybuffernd_seeds_idx.diminfo[0].shape = __pyx_pybuffernd_seeds_idx.rcbuffer->pybuffer.shape[0]; 
    } 
    }

其中，as stated in the documentation you linked是爲了有數據緩衝區快速訪問最有可能進行。

的最佳替代品，到目前爲止，使用typed memoryviews。這些是本地方式，很可能是在cython中使用數組的最簡單方法。 Their performance is usually on par with numpy arrays如果不是，您可以隨時輕鬆切換它們。

來源

2016-05-24 10:06:55

然後我的問題是，通過這種方式直接聲明字典來使用numpy數組有什麼優勢？ – Gioker

我對字典的創作有些模糊;每當你想將關鍵字參數傳遞給一個函數時，就會創建一個字典，所以在'np.empty（10，dtype = np.int）'的情況下，會創建一個字典來存儲參數。在'np.empty（10）'的情況下，不會創建字典。總是建議聲明數組的類型，以便Cython可以基於它優化生成的「c」代碼。我更新了我的答案，在* typed memoryviews *中包含了一些鏈接，這是在Cython中執行任務的明智方式。 –

噢好吧，只是爲了關鍵字...我還是想了很多開銷。關於類型化的memoryview，我發佈了一個新的問題[here]（http://stackoverflow.com/questions/37432076/cython-typed-memoryviews-what-they-really-are）。謝謝吉姆！ – Gioker

用cython聲明一個numpy數組奇怪地會產生很多開銷

回答

相關問題