的Python：在陣列

我有個1個維數據與被設置爲9999。在這裏沒有一些數據值來設置替換值是提取物，因爲它是相當長：的Python：在陣列

this_array = [ 4, 4, 1, 9999, 9999, 9999, -5, -4, ... ]

我想更換沒有數據值與兩邊最接近的數值的平均值，但是有些沒有數據值的數據值最接近也沒有數據值，取而代之的是有點難度。即我希望將三個無數據值替換爲-2。我已經創建了一個循環來通過每個標量陣列和測試爲無數據：

for k in this_array: 
    if k == 9999: 
     temp = np.where(k == 9999, (abs(this_array[k-1]-this_array[k+1])/2), this_array[k]) 
    else: 
     pass 
this_array[k] = temp

但是我需要在添加如果功能或方式，採取前k-1個或後k處的值+1如果也等於9999例如：

if np.logical_or(k+1 == 9999, k-1 == 9999): 
    temp = np.where(k == 9999, (abs(this_array[k-2]-this_array[k+2])/2), this_array[k])

正如一看就知道，這個代碼就會變得混亂作爲一個可能最終採取了錯誤的值或嵌套IF函數加載結束了。有沒有人知道一個更清晰的方法來實現這一點，因爲它在整個數據集中都非常可變？

根據要求：如果第一個和/或最後一個點沒有數據，他們最好用最近的數據點替換。

來源

2012-12-18 AJEnvMap

如果列表中的最後一個元素是'9999'，該怎麼辦？你想用它取代什麼樣的價值？ – Cameron

@Cameron道歉，如果最後一個元素是「9999」，那麼它可以被替換爲倒數第二個元素。謝謝。 – AJEnvMap

可能有numpy的功能做到這一點更efficeint方式，但這裏是使用溶液的itertools module：

from itertools import groupby 

for k, g in groupby(range(len(this_array)), lambda i: this_array[i] == 9999): 
    if k: 
     indices = list(g) 
     new_v = (this_array[indices[0]-1] + this_array[indices[-1]+1])/2 
     this_array[indices[0]:indices[-1]+1].fill(new_v)

如果最後一個元素或第一元素可以是9999，您使用以下命令：

from itertools import groupby 

for k, g in groupby(range(len(this_array)), lambda i: this_array[i] == 9999): 
    if k: 
     indices = list(g) 
     prev_i, next_i = indices[0]-1, indices[-1]+1 
     before = this_array[prev_i] if prev_i != -1 else this_array[next_i] 
     after = this_array[next_i] if next_i != len(this_array) else before 
     this_array[indices[0]:next_i].fill((before + after)/2)

實施例使用第二個版本：

>>> from itertools import groupby 
>>> this_array = np.array([9999, 4, 1, 9999, 9999, 9999, -5, -4, 9999]) 
>>> for k, g in groupby(range(len(this_array)), lambda i: this_array[i] == 9999): 
...  if k: 
...   indices = list(g) 
...   prev_i, next_i = indices[0]-1, indices[-1]+1 
...   before = this_array[prev_i] if prev_i != -1 else this_array[next_i] 
...   after = this_array[next_i] if next_i != len(this_array) else before 
...   this_array[indices[0]:next_i].fill((before + after)/2) 
... 
>>> this_array 
array([ 4, 4, 1, -2, -2, -2, -5, -4, -4])

來源

2012-12-18 22:02:40

我會做一些事情大致如下：

import numpy as np 

def fill(arr, fwd_fill): 
    out = arr.copy() 
    if fwd_fill: 
    start, end, step = 0, len(out), 1 
    else: 
    start, end, step = len(out)-1, -1, -1 
    cur = out[start] 
    for i in range(start, end, step): 
    if np.isnan(out[i]): 
     out[i] = cur 
    else: 
     cur = out[i] 
    return out 

def avg(arr): 
    fwd = fill(arr, True) 
    back = fill(arr, False) 
    return (fwd[:-2] + back[2:])/2. 

arr = np.array([ 4, 4, 1, np.nan, np.nan, np.nan, -5, -4]) 
print arr 
print avg(arr)

第一個功能可以做向前或向後的填充，與最近的非楠替換每個NaN的。

一旦你有了這些，計算平均值是微不足道的，並由第二個函數完成。

你不說你想如何處理第一個和最後一個元素，所以代碼只是把它們切掉。

最後，值得注意的是，如果輸入數組的第一個或最後一個元素缺失（在這種情況下沒有數據來計算某些平均值），函數可以返回NaN。

來源

2012-12-18 22:15:56 NPE

下面是一個遞歸解決方案，其中第一個和最後一個不是9999。您可以使用生成器清理它，因爲遞歸可能會很深。這是一個合理的起始

def a(list, first, depth):  
    if ([] == list): 
    return [] 
    car = list[0] 
    cdr = list[1:] 
    if (9999 == car):   
     return a(cdr, first, depth+1) 
    if (depth != 0): 
     avg = [((first + car) /2)] * depth 
     return avg + [car] + a(cdr, car, 0) 
    else: 
     return [car] + a(cdr, car, 0) 



print a([1,2,9999, 4, 9999,9999, 12],0,0) 
# => [1, 2, 3, 4, 8, 8, 12]

來源

2012-12-18 22:45:34

好，恐怕我得寫我自己，你可以使用np.interp或同等學歷（也許有點更好和更多功能的）SciPy的功能，你可以在scipy.interpolate找到。

好的，重讀...我想你不想要線性插值？在這種情況下，這當然不起作用......雖然我確信有一些矢量化的方法。

imort numpy as np 
# data is the given array. 
data = data.astype(float) # I cast to float, if you don't want that badly... 
valid = data != 9999 
x = np.nonzero(valid)[0] 
replace = np.nonzero(~valid)[0] 
valid_data = data[x] 

# using np.interp, but I think you will find better things in scipy.interpolate 
# if you don't mind using scipy. 
data[replace] = np.interp(replace, x, valid_data, 
            left=valid_data[0], right=valid_data[-1])

來源

2012-12-18 22:58:59 seberg

的Python：在陣列

回答

相關問題