2017-02-13 25 views
4

我有兩個numpy數組NS,EW來加總。他們每個人都有不同位置的缺失值,像在numpy數組求和中將nan視爲零除了所有數組中的nan

NS = array([[ 1., 2., nan], 
     [ 4., 5., nan], 
     [ 6., nan, nan]]) 
EW = array([[ 1., 2., nan], 
     [ 4., nan, nan], 
     [ 6., nan, 9.]] 

我如何能在numpy的方式進行求和操作,這將把南爲零,如果一個數組楠在一個位置,並保持楠如果兩個數組在同一位置有nan。

我希望看到的結果是

SUM = array([[ 2., 4., nan], 
      [ 8., 5., nan], 
      [ 12., nan, 9.]]) 

當我嘗試

SUM=np.add(NS,EW) 

它給了我

SUM=array([[ 2., 4., nan], 
     [ 8., nan, nan], 
     [ 12., nan, nan]]) 

當我嘗試

SUM = np.nansum(np.dstack((NS,EW)),2) 

它給了我

SUM=array([[ 2., 4., 0.], 
     [ 8., 5., 0.], 
     [ 12., 0., 9.]]) 

當然,我可以做元素級操作實現我的目標,

for i in range(np.size(NS,0)): 
    for j in range(np.size(NS,1)): 
     if np.isnan(NS[i,j]) and np.isnan(EW[i,j]): 
      SUM[i,j] = np.nan 
     elif np.isnan(NS[i,j]): 
      SUM[i,j] = EW[i,j] 
     elif np.isnan(EW[i,j]): 
      SUM[i,j] = NS[i,j] 
     else: 
      SUM[i,j] = NS[i,j]+EW[i,j] 

但它是非常緩慢的。所以我正在尋找一種更加樸素的解決方案來解決這個問題。

感謝您的幫助!

回答

4

方法1:一種方法與np.where -

def sum_nan_arrays(a,b): 
    ma = np.isnan(a) 
    mb = np.isnan(b) 
    return np.where(ma&mb, np.nan, np.where(ma,0,a) + np.where(mb,0,b)) 

採樣運行 -

In [43]: NS 
Out[43]: 
array([[ 1., 2., nan], 
     [ 4., 5., nan], 
     [ 6., nan, nan]]) 

In [44]: EW 
Out[44]: 
array([[ 1., 2., nan], 
     [ 4., nan, nan], 
     [ 6., nan, 9.]]) 

In [45]: sum_nan_arrays(NS, EW) 
Out[45]: 
array([[ 2., 4., nan], 
     [ 8., 5., nan], 
     [ 12., nan, 9.]]) 

方法2:可能是更快的一個與boolean-indexing混合 -

def sum_nan_arrays_v2(a,b): 
    ma = np.isnan(a) 
    mb = np.isnan(b) 
    m_keep_a = ~ma & mb 
    m_keep_b = ma & ~mb 
    out = a + b 
    out[m_keep_a] = a[m_keep_a] 
    out[m_keep_b] = b[m_keep_b] 
    return out 

運行測試 -

In [140]: # Setup input arrays with 4/9 ratio of NaNs (same as in the question) 
    ...: a = np.random.rand(3000,3000) 
    ...: b = np.random.rand(3000,3000) 
    ...: a.ravel()[np.random.choice(range(a.size), size=4000000, replace=0)] = np.nan 
    ...: b.ravel()[np.random.choice(range(b.size), size=4000000, replace=0)] = np.nan 
    ...: 

In [141]: np.nanmax(np.abs(sum_nan_arrays(a, b) - sum_nan_arrays_v2(a, b))) # Verify 
Out[141]: 0.0 

In [142]: %timeit sum_nan_arrays(a, b) 
10 loops, best of 3: 141 ms per loop 

In [143]: %timeit sum_nan_arrays_v2(a, b) 
10 loops, best of 3: 177 ms per loop 

In [144]: # Setup input arrays with lesser NaNs 
    ...: a = np.random.rand(3000,3000) 
    ...: b = np.random.rand(3000,3000) 
    ...: a.ravel()[np.random.choice(range(a.size), size=4000, replace=0)] = np.nan 
    ...: b.ravel()[np.random.choice(range(b.size), size=4000, replace=0)] = np.nan 
    ...: 

In [145]: np.nanmax(np.abs(sum_nan_arrays(a, b) - sum_nan_arrays_v2(a, b))) # Verify 
Out[145]: 0.0 

In [146]: %timeit sum_nan_arrays(a, b) 
10 loops, best of 3: 69.6 ms per loop 

In [147]: %timeit sum_nan_arrays_v2(a, b) 
10 loops, best of 3: 38 ms per loop 
+0

它完美的工作,也比我使用的元素級操作快200倍。感謝您的幫助! – Superstar

1

我認爲我們可以得到一點更簡潔,在同樣的Divakar的第二種方法。隨着a = NSb = EW

na = numpy.isnan(a) 
nb = numpy.isnan(b) 
a[na] = 0 
b[nb] = 0 
a += b 
na &= nb 
a[na] = numpy.nan 

的操作是就地在可能的情況,以節省內存中完成,假設這是在您的方案是可行的。最終結果在a

+0

是的,較少的內存是優選的,因爲計算可以在大矩陣上執行。我將在我的代碼中切換到您的解決方案。謝謝! – Superstar

2

其實你nansum方法幾乎工作,你只需要在再次nans補充:

def add_ignore_nans(a, b): 
    stacked = np.array([a, b]) 
    res = np.nansum(stacked, axis=0) 
    res[np.all(np.isnan(stacked), axis=0)] = np.nan 
    return res 

>>> add_ignore_nans(a, b) 
array([[ 2., 4., nan], 
     [ 8., 5., nan], 
     [ 12., nan, 9.]]) 

這將是比@Divakar的回答慢,但我想提一提,你是非常接近了!:-)

+0

我明白了,我錯過了一個額外的邏輯和陳述來過濾索引。謝謝你的幫助! – Superstar