沿軸忽略nans的np.percentile的最佳方式是什麼？

對於包含NaN值的數據，有沒有相當快的方法可以做np.percentile(ndarr, axis=0)？沿軸忽略nans的np.percentile的最佳方式是什麼？

對於np.median，有相應的bottleneck.nanmedian（https://pypi.python.org/pypi/Bottleneck），這是相當不錯的。

我來爲百分比，這是不完全和不正確目前最好的是：

from bottleneck import nanrankdata, nanmax, nanargmin 
    def nanpercentile(x, q, axis): 
     ranks = nanrankdata(x, axis=axis) 
     peak = nanmax(ranks, axis=axis) 
     pct = ranks/peak/100. # to make a percentile 
     wh = nanargmin(abs(pct-q),axis=axis) 
     return x[wh]

這不工作;真正需要的是沿着axis沿着第n個元素的某種方式，但是我還沒有找到用這種numpy切片技巧來做到這一點。

「相當快」是指除遍歷指標更好，例如：

q = 40 
x = np.array([[[1,2,3],[6,np.nan,4]],[[0.5,2,1],[9,3,np.nan]]]) 
out = np.empty(x.shape[:-1]) 
for i in range(x.shape[0]): 
    for j in range(x.shape[1]): 
     d = x[i,j,:] 
     out[i,j] = np.percentile(d[np.isfinite(d)], q) 

print out 

#array([[ 1.8, 4.8], 
#  [ 0.9, 5.4]])

其工作，但可能非常緩慢。

np.ma似乎沒有按預期工作;它把就好像它是inf的nan值：

xm = np.ma.masked_where(np.isnan(x),x) 
print np.percentile(xm,40,axis=2) 

# array([[ 1.8, 5.6], 
#  [ 0.9, 7.8]])

來源

2014-04-17 keflavich

np.nanpercentile包括在numpy的1.9.0

http://docs.scipy.org/doc/numpy/reference/generated/numpy.nanpercentile.html

來源

2015-10-30 02:46:14 Maximilian

可以使用partition()在numpy的1.8採取沿軸線的第n個元素，這裏是代碼相處的最後一個軸的第二要素：

x = np.array([[[1,2,3],[6,np.nan,4]],[[0.5,2,1],[9,3,np.nan]]]) 
np.partition(x, 1)[..., 1]

輸出：

array([[ 2., 6.], 
     [ 1., 9.]])

來源

2014-04-17 04:52:43 HYRY

有趣的是，我使用的是1.7.1 - 這是什麼回報呢？ – wwii

如果你想沿'n'不恆定的座標軸運行'n''元素，這個工作是否可行？另外，'partition'在內存中創建一個副本，這是不可取的，但可能是不可避免的。 – keflavich

你可以操縱數組的步伐更快地迭代它，使用as_strided()它在numpy.lib.stride_tricks中找到。

您的計算可以被視爲在您的陣列的（1,1,3）窗口上運行。我喜歡用一個廣義函數（sliding_window()是使用as_strided()創建 N乘N的窗戶，我發現在這裏 - Efficient Overlapping Windows with Numpy;。信貸的功能顯然是去johnvinyard該博客網頁是正在發生的事情的一個很好的說明

使一些1x1x3窗口

import numpy as np 
x = np.array([[[1,2,3],[6,np.nan,4]],[[0.5,2,1],[9,3,np.nan]]]) 
for thing in sliding_window(x, (1,1,3)): 
    print thing 

# [ 1. 2. 3.] 
# [ 6. nan 4.] 
# [ 0.5 2. 1. ] 
# [ 9. 3. nan]

應用```np.percentile（） '' - 不顧NaN的

for thing in sliding_window(x, (1,1,3)): 
    print np.percentile(thing[np.isfinite(thing)], 40) 

# 1.8 
# 4.8 
# 0.9 
# 5.4

使結果的數組：

per_s = [np.percentile(thing[np.isfinite(thing)], 40) 
     for thing in sliding_window(x, (1,1,3))] 

print per_s 
# [1.8, 4.8000000000000007, 0.90000000000000002, 5.4000000000000004] 

per_s = np.array(per_s) 
print per_s 
# array([ 1.8, 4.8, 0.9, 5.4])

拿回來給你的形狀期望

print per_s.reshape((2,2)) 
# array([[ 1.8, 4.8], 
#  [ 0.9, 5.4]]) 

print per_s.reshape(x.shape[:-1]) 
# array([[ 1.8, 4.8], 
#  [ 0.9, 5.4]])

這應該會更快。我很好奇，如果它會 - 我沒有任何真實世界問題來測試它。

谷歌搜索numpy的as_strided的變成了一些不錯的成績：我有這樣的一個書籤，http://scipy-lectures.github.io/advanced/advanced_numpy/

sliding_window()從Efficient Overlapping Windows with Numpy

from numpy.lib.stride_tricks import as_strided as ast 
from itertools import product 

def norm_shape(shape): 
    ''' 
    Normalize numpy array shapes so they're always expressed as a tuple, 
    even for one-dimensional shapes. 

    Parameters 
     shape - an int, or a tuple of ints 

    Returns 
     a shape tuple 
    ''' 
    try: 
     i = int(shape) 
     return (i,) 
    except TypeError: 
     # shape was not a number 
     pass 

    try: 
     t = tuple(shape) 
     return t 
    except TypeError: 
     # shape was not iterable 
     pass 

    raise TypeError('shape must be an int, or a tuple of ints') 


def sliding_window(a,ws,ss = None,flatten = True): 
    ''' 
    Return a sliding window over a in any number of dimensions 

    Parameters: 
     a - an n-dimensional numpy array 
     ws - an int (a is 1D) or tuple (a is 2D or greater) representing the size 
      of each dimension of the window 
     ss - an int (a is 1D) or tuple (a is 2D or greater) representing the 
      amount to slide the window in each dimension. If not specified, it 
      defaults to ws. 
     flatten - if True, all slices are flattened, otherwise, there is an 
        extra dimension for each dimension of the input. 

    Returns 
     an array containing each n-dimensional window from a 
    ''' 

    if None is ss: 
     # ss was not provided. the windows will not overlap in any direction. 
     ss = ws 
    ws = norm_shape(ws) 
    ss = norm_shape(ss) 

    # convert ws, ss, and a.shape to numpy arrays so that we can do math in every 
    # dimension at once. 
    ws = np.array(ws) 
    ss = np.array(ss) 
    shape = np.array(a.shape) 


    # ensure that ws, ss, and a.shape all have the same number of dimensions 
    ls = [len(shape),len(ws),len(ss)] 
    if 1 != len(set(ls)): 
     raise ValueError(\ 
     'a.shape, ws and ss must all have the same length. They were %s' % str(ls)) 

    # ensure that ws is smaller than a in every dimension 
    if np.any(ws > shape): 
     raise ValueError('ws cannot be larger than a in any dimension. a.shape was %s and ws was %s' % (str(a.shape),str(ws))) 

    # how many slices will there be in each dimension? 
    newshape = norm_shape(((shape - ws) // ss) + 1) 
    # the shape of the strided array will be the number of slices in each dimension 
    # plus the shape of the window (tuple addition) 
    newshape += norm_shape(ws) 
    # the strides tuple will be the array's strides multiplied by step size, plus 
    # the array's strides (tuple addition) 
    newstrides = norm_shape(np.array(a.strides) * ss) + a.strides 
    strided = ast(a,shape = newshape,strides = newstrides) 
    if not flatten: 
     return strided 

    # Collapse strided so that it has one more dimension than the window. I.e., 
    # the new array is a flat list of slices. 
    meat = len(ws) if ws.shape else 0 
    firstdim = (np.product(newshape[:-meat]),) if ws.shape else() 
    dim = firstdim + (newshape[-meat:]) 
    # remove any dimensions with size 1 
    #dim = filter(lambda i : i != 1,dim) 
    dim = tuple(thing for thing in dim if thing != 1) 
    return strided.reshape(dim)

來源

2014-04-17 05:48:36 wwii

如果您不需要超快速的解決方案，你可以先將你的數組轉移到Pandas DataFrame並做分位數，然後返回到numpy數組。

df = pd.DataFrame(array.T).quantile() 
arr = np.array(df)

來源

2015-08-13 18:28:29

沿軸忽略nans的np.percentile的最佳方式是什麼？

回答

相關問題