2016-11-28 45 views
2

我有以下述方式限定的熊貓數據幀:Python的熊貓數據幀1的非零和非NaN的屬性每n行

2009-11-18 500.0 
2009-11-19 500.0 
2009-11-20 NaN 
2009-11-23 500.0 
2009-11-24 500.0 
2009-11-25 NaN 
2009-11-27 NaN 
2009-11-30 NaN 
2009-12-01 500.0 
2009-12-02 500.0 
2009-12-03 500.0 
2009-12-04 500.0 
2009-12-07 NaN 
2009-12-08 NaN 
2009-12-09 500.0 
2009-12-10 500.0 
2009-12-11 500.0 
2009-12-14 500.0 

我的目的是保持一種非NaN的元件每n行。例如,如果我的n是4,我會保持2009-11-18 500,並將所有其他的(包括)2009-11-23設置爲0,我會重複相同的其他元素的數組,是否有效率, pythonic,矢量化的方式呢?

爲了使這更具體,我打算在陣列最終是這樣的:

2009-11-18 500.0 
2009-11-19 0 
2009-11-20 0 
2009-11-23 0 
2009-11-24 500.0 
2009-11-25 0 
2009-11-27 0 
2009-11-30 0 
2009-12-01 500.0 
2009-12-02 0 
2009-12-03 0 
2009-12-04 0 
2009-12-07 0 
2009-12-08 0 
2009-12-09 500.0 
2009-12-10 0 
2009-12-11 0 
2009-12-14 0 
+0

因此,如果最後一組的長度不是'4',值是忽略? – jezrael

回答

1

我覺得你可以先用np.arange與地板divison創建組,然後groupby和獲得的第一個非NaN指數值由idxmax。最後由where得到0如果不包含a值:最後的值

print (np.arange(len(df.index)) // 4) 
[0 0 0 0 1 1 1 1 2 2 2 2 3 3 3 3 4 4] 

idx = df.col.groupby([np.arange(len(df.index)) // 4]).idxmin() 
print (idx) 
0 2009-11-18 
1 2009-11-24 
2 2009-12-01 
3 2009-12-09 
4 2009-12-11 
Name: col, dtype: datetime64[ns] 

df.col = df.col.where(df.index.isin(idx), 0) 
print (df) 
       col 
2009-11-18 500.0 
2009-11-19 0.0 
2009-11-20 0.0 
2009-11-23 0.0 
2009-11-24 500.0 
2009-11-25 0.0 
2009-11-27 0.0 
2009-11-30 0.0 
2009-12-01 500.0 
2009-12-02 0.0 
2009-12-03 0.0 
2009-12-04 0.0 
2009-12-07 0.0 
2009-12-08 0.0 
2009-12-09 500.0 
2009-12-10 0.0 
2009-12-11 500.0 
2009-12-14 0.0 

解決方案,如果最後一組的長度不4,是omiting:

arr = np.arange(len(df.index)) // 4 
print (arr) 
[0 0 0 0 1 1 1 1 2 2 2 2 3 3 3 3 4 4] 

#if equal by last value of array substract 1 
arr1 = np.where(arr == arr[-1], arr[-1] - 1, arr) 
print (arr1) 
[0 0 0 0 1 1 1 1 2 2 2 2 3 3 3 3 3 3] 

idx = df.col.groupby(arr1).idxmin() 
print (idx) 
0 2009-11-18 
1 2009-11-24 
2 2009-12-01 
3 2009-12-09 
Name: col, dtype: datetime64[ns] 
df.col = df.col.where(df.index.isin(idx), 0) 
print (df) 
       col 
2009-11-18 500.0 
2009-11-19 0.0 
2009-11-20 0.0 
2009-11-23 0.0 
2009-11-24 500.0 
2009-11-25 0.0 
2009-11-27 0.0 
2009-11-30 0.0 
2009-12-01 500.0 
2009-12-02 0.0 
2009-12-03 0.0 
2009-12-04 0.0 
2009-12-07 0.0 
2009-12-08 0.0 
2009-12-09 500.0 
2009-12-10 0.0 
2009-12-11 0.0 
2009-12-14 0.0 
1

IIUC
你當你得到你的下一個值時重新啓動你的櫃檯。在這種情況下,我會使用一個發生器。沒有矢量化!

def next4(s): 
    idx = s.first_valid_index() 
    while idx is not None: 
     loc = s.index.get_loc(idx) 
     yield s.loc[[idx]] 
     idx = s.iloc[loc+4:].first_valid_index() 

pd.concat(next4(df[1])).reindex(df.index, fill_value=0).to_frame() 

enter image description here