2016-08-15 95 views
0

我有一個熊貓數據框,我想填寫一些NaN值。基於MultiIndex Pandas填充NaN

import pandas as pd 

tuples = [('a', 1990),('a', 1994),('a',1996),('b',1992),('b',1997),('c',2001)] 
index = pd.MultiIndex.from_tuples(tuples, names = ['Type', 'Year']) 
vals = ['NaN','NaN','SomeName','NaN','SomeOtherName','SomeThirdName'] 
df = pd.DataFrame(vals, index=index) 

print(df) 

         0 
Type Year    
a 1990   NaN 
    1994   NaN 
    1996  SomeName 
b 1992   NaN 
    1997 SomeOtherName 
c 2001 SomeThirdName 

,我想輸出是:

Type Year    
a 1990  SomeName 
    1994  SomeName 
    1996  SomeName 
b 1992 SomeOtherName 
    1997 SomeOtherName 
c 2001 SomeThirdName 

這需要在一個更大的數據幀(百萬行)完成,其中每個「類型」可以1-5之間唯一擁有'年「,名稱價值只出現在最近一年。爲了性能目的,我試圖避免迭代行。

回答

1

您可以按降序排序索引數據幀,然後ffill它:

import pandas as pd 
df.sort_index(level = [0,1], ascending = False).ffill() 

#       0 
# Type Year  
# c 2001 SomeThirdName 
# b 1997 SomeOtherName 
#  1992 SomeOtherName 
# a 1996 SomeName 
#  1994 SomeName 
#  1990 SomeName 

注:例如數據並沒有真正包含np.nan值,但字符串NaN,所以爲了ffill工作您需要更換NaN字符串作爲np.nan

import numpy as np 
df[0] = np.where(df[0] == "NaN", np.nan, df[0]) 

或者作爲@ayhan建議,與替換字符串「南」之後使用df.bfill()

+0

或者直接'.bfill()'? :) – ayhan

+0

@ayhan正是這裏需要的一個。 – Psidom