2016-06-16 106 views
2

給定一個具有相似列的數據框,其中間有空值。如何使用其他列中的非空值動態填充列中的空值,而不明確指定其他列名稱的名稱,例如選擇第一列category1並用來自其他相同行的列的值填充空行?使用其他列中的非空值填充空值

data = {'year': [2010, 2011, 2012, 2013, 2014, 2015, 2016,2017, 2018, 2019], 
     'category1': [None, 21, None, 10, None, 30, 31,45, 23, 56], 
     'category2': [10, 21, 20, 10, None, 30, None,45, 23, 56], 
     'category3': [10, 21, 20, 10, None, 30, 31,45, 23, 56],} 


df = pd.DataFrame(data) 
df = df.set_index('year') 
df 

    category1 category2 category3 
year    
2010 NaN 10 10 
2011 21 21 21 
2012 NaN 20 20 
2013 10 10 10 
2014 NaN NaN NaN 
2015 30 30 NaN 
2016 31 NaN 31 
2017 45 45 45 
2018 23 23 23 
2019 56 56 56 

填充category1後:

category1 category2 category3 
year    
2010 10 10 10 
2011 21 21 21 
2012 20 20 20 
2013 10 10 10 
2014 NaN NaN NaN 
2015 30 30 NaN 
2016 31 NaN 31 
2017 45 45 45 
2018 23 23 23 
2019 56 56 56 

回答

0

您可以使用first_valid_index與條件,如果所有值都NaN

def f(x): 
    if x.first_valid_index() is None: 
     return None 
    else: 
     return x[x.first_valid_index()] 

df['a'] = df.apply(f, axis=1) 

print (df) 
     category1 category2 category3  a 
year          
2010  NaN  10.0  10.0 10.0 
2011  21.0  21.0  21.0 21.0 
2012  NaN  20.0  20.0 20.0 
2013  10.0  10.0  10.0 10.0 
2014  NaN  NaN  NaN NaN 
2015  30.0  30.0  30.0 30.0 
2016  31.0  NaN  31.0 31.0 
2017  45.0  45.0  45.0 45.0 
2018  23.0  23.0  23.0 23.0 
2019  56.0  56.0  56.0 56.0 
+0

謝謝@jezrael,我更新了問題。我的意思是動態填充其他列而沒有明確說明其他列 – ArchieTiger

+0

我編輯答案,請檢查它。 – jezrael

+0

它的工作,謝謝! – ArchieTiger

1

IIUC你能做到這樣:

In [369]: df['category1'] = df['category1'].fillna(df['category2']) 

In [370]: df 
Out[370]: 
     category1 category2 category3 
year 
2010  10.0  10.0  10.0 
2011  21.0  21.0  21.0 
2012  20.0  20.0  20.0 
2013  10.0  10.0  10.0 
2014  NaN  NaN  NaN 
2015  30.0  30.0  30.0 
2016  31.0  NaN  31.0 
2017  45.0  45.0  45.0 
2018  23.0  23.0  23.0 
2019  56.0  56.0  56.0 
+0

如何動態填充,因爲其他列名可以是未知的? – ArchieTiger

+0

@ user1128088,我不明白,你能舉個例子嗎? – MaxU

+0

..沒有硬編碼到'category2' – ArchieTiger

0

試試這個:

df['category1']= df['category1'].fillna(df.median(axis=1))