2013-07-23 74 views
4

我想用給定組的最後一個有效值填充數據幀NaNs。例如:訪問最後一個非空值的大熊貓

import pandas as pd 
import random as randy 
import numpy as np 

df_size = int(1e1)     
df = pd.DataFrame({'category': randy.sample(np.repeat(['Strawberry','Apple',],df_size),df_size), 'values': randy.sample(np.repeat([np.NaN,0,1],df_size),df_size)}, index=randy.sample(np.arange(0,10),df_size)).sort_index(by=['category'], ascending=[True]) 

提供:

 category value 
7  Apple  NaN 
6  Apple  1 
4  Apple  0 
5  Apple  NaN 
1  Apple  NaN 
0 Strawberry  1 
8 Strawberry  NaN 
2 Strawberry  0 
3 Strawberry  0 
9 Strawberry  NaN 

和列我要計算這樣的容貌:

 category value last_value 
7  Apple  NaN   NaN 
6  Apple  1   NaN 
4  Apple  0   1 
5  Apple  NaN   0 
1  Apple  NaN   0 
0 Strawberry  1   NaN 
8 Strawberry  NaN   1 
2 Strawberry  0   1 
3 Strawberry  0   0 
9 Strawberry  NaN   0 

嘗試shift()iterrows()但無濟於事。

回答

3

它看起來像你想先做一個ffill然後做一個shift

In [11]: df['value'].ffill() 
Out[11]: 
7 NaN 
6  1 
4  0 
5  0 
1  0 
0  1 
8  1 
2  0 
3  0 
9  0 
Name: value, dtype: float64 

In [12]: df['value'].ffill().shift(1) 
Out[12]: 
7 NaN 
6 NaN 
4  1 
5  0 
1  0 
0  0 
8  1 
2  1 
3  0 
9  0 
Name: value, dtype: float64 

要做到這一點對每個你必須先GROUPBY類別,然後再應用此功能:

In [13]: g = df.groupby('category') 

In [14]: g['value'].apply(lambda x: x.ffill().shift(1)) 
Out[14]: 
7 NaN 
6 NaN 
4  1 
5  0 
1  0 
0 NaN 
8  1 
2  1 
3  0 
9  0 
dtype: float64 

In [15]: df['last_value'] = g['value'].apply(lambda x: x.ffill().shift(1)) 
+0

我想OP希望把這個技巧拉到'df.groupby(「category」)',這可能解釋第三個NaN。 – DSM

+0

@DSM :)現在看起來非常明顯! –

+0

工程像魅力,謝謝你們兩個! – mrbarti