從numpy數組中選擇具有多個條件的元素

我正在尋找最快的方式來選擇滿足幾個條件的numpy數組的元素。舉例來說，我想從數組中選擇介於0.2和0.8之間的所有元素。我通常這樣做：從numpy數組中選擇具有多個條件的元素

the_array = np.random.random(100000) 
idx = (the_array > 0.2) * (the_array < 0.8) 
selected_elements = the_array[idx]

然而，這創建具有相同大小the_array（一個用於the_array> 0.2，一個用於the_array < 0.8）兩個附加陣列。如果數組很大，則可能會消耗大量內存。有什麼辦法可以解決這個問題嗎？所有內置的numpy函數（比如logical_and）似乎都在做同樣的事情。

來源

2014-03-27 user3468216

聽起來像你想要的內存效率最高的方式，而不是最快的方式。這兩個往往是不一樣的。 – M4rtini

這些布爾值掩碼中的每一個都只是數組大小的1/8，如果它是雙精度的，所以它通常不是問題。如果你關心的是內存而不是速度，那麼你可以對數組進行排序，然後找到第一個和最後一個索引，並調用searchsorted。 – Jaime

那麼，我關心速度和內存效率。對我來說，像C這樣的編譯語言中最明顯的實現就是簡單地遍歷數組，測試每個元素並保存通過測試的元素。這應該比我在上面發佈的例子更快，更有效率，它實際上必須循環遍歷數組三次。我正在尋找的是一種方式來做一個numpy數組，但也許這是不可能的。 – user3468216

您可以爲選擇實施自定義C調用。執行此操作的最基本方法是通過實現。

select.c

int select(float lower, float upper, float* in, float* out, int n) 
{ 
    int ii; 
    int outcount = 0; 
    float val; 
    for (ii=0;ii<n;ii++) 
    { 
     val = in[ii]; 
     if ((val>lower) && (val<upper)) 
     { 
      out[outcount] = val; 
      outcount++; 
     } 
    } 
    return outcount; 
}

其被編譯成：

gcc -lm -shared select.c -o lib.so

而關於蟒側：

select.py

import ctypes as C 
from numpy.ctypeslib import as_ctypes 
import numpy as np 

# open the library in python 
lib = C.CDLL("./lib.so") 

# explicitly tell ctypes the argument and return types of the function 
pfloat = C.POINTER(C.c_float) 
lib.select.argtypes = [C.c_float,C.c_float,pfloat,pfloat,C.c_int] 
lib.select.restype = C.c_int 

size = 1000000 

# create numpy arrays 
np_input = np.random.random(size).astype(np.float32) 
np_output = np.empty(size).astype(np.float32) 

# expose the array contents to ctypes 
ctypes_input = as_ctypes(np_input) 
ctypes_output = as_ctypes(np_output) 

# call the function and get the number of selected points 
outcount = lib.select(0.2,0.8,ctypes_input,ctypes_output,size) 

# select those points 
selected = np_output[:outcount]

不要指望通過這樣的香草實現狂野的加速，但在C方面，您可以選擇加入OpenMP編譯指示以獲得快速和骯髒的並行性，這可能會給您帶來顯着提升。

也如評論中提到的那樣，numexpr可能是一種更快捷的方式，只需幾行即可完成所有這些工作。

來源

2014-03-27 13:47:26 ebarr

從numpy數組中選擇具有多個條件的元素

回答

相關問題