用列表中的值替換pandas.DataFrame的NaN值

在使用庫pandas的python腳本中，我有一個數據集，例如100行帶有「X」的特徵，包含36個NaN值，還有一個大小爲36的列表用列表中的值替換pandas.DataFrame的NaN值

我想用我列表中的36個值替換列「X」的所有36個缺失值。

這可能是一個愚蠢的問題，但我經歷了所有的文檔，並找不到一種方法來做到這一點。

下面是一個例子：

INPUT

Data: X  Y 
     1  8 
     2  3 
     NaN 2 
     NaN 7 
     1  2 
     NaN 2

填料

List: [8, 6, 3]

輸出

來源

2017-02-10 Mean-Street

你能提供輸入和預期輸出 – Shijo

當然，我編輯我的帖子來添加它。 –

同一列中的所有'NaN'值都是？你如何用你的列表替換'NaN'值？你是否順序執行該操作，即用列表中的第一個值替換第一個「NaN」值，依此類推？ –

開始使用dataframe df

print(df) 

    X Y 
0 1.0 8 
1 2.0 3 
2 NaN 2 
3 NaN 7 
4 1.0 2 
5 NaN 2

定義要填充值（注：必須有相同數量的元素在你的filler列表，在您的數據幀NaN值）

filler = [8, 6, 3]

您可以篩選列（包含NaN值）和你filler

~~df.X[df.X.isnull()] = filler~~

df.loc[df.X.isnull(), 'X'] = filler

覆蓋選定行這給出：

print(df) 

    X Y 
0 1.0 8 
1 2.0 3 
2 8.0 2 
3 6.0 7 
4 1.0 2 
5 3.0 2

來源

2017-02-10 20:17:21 bunji

它工作得很好，謝謝，但我有一個警告「SettingWithCopyWarning：值試圖設置副本從DataFrame切片「。這很奇怪，因爲我看到它確實修改了'df' ... –

根據[文檔]（http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy ）在警告中列出，您需要將'df.X [df.X.isnull（）]'更改爲'df.loc [df.X.isnull（），'X']' –

@MadPhysicist是正確的，如果你想要避免這個警告。就我個人而言，我傾向於使用原始語法，因爲它看起來更直觀（對我來說），我只是忽略了警告，因爲它確實是我想要的。但是如果'.loc'方法對你來說看起來不錯，那麼你應該使用那個。 – bunji

這可能不是有效的，但仍然有效:) 首先找到Nan的所有索引並將它們替換爲循環。假設列表總是大於楠的數量

import pandas as pd 
import numpy as np 

df = pd.DataFrame({'A': [np.nan, 1, 2], 'B': [10, np.nan, np.nan], 'C': [[20, 21, 22], [23, 24, 25], np.nan]}) 
lst=[12,35,78] 

index = df['B'].index[df['B'].apply(np.isnan)] #find Index 
cnt=0 
for item in index: 
    df.set_value(item, 'B', lst[item]) #replace Nan of the nth index with value from Nth value from list 
    cnt=cnt+1 

print df 

    A  B    C 
0 NaN 10.0 [20, 21, 22] 
1 1.0 NaN [23, 24, 25] 
2 2.0 NaN   NaN

輸出。

 A  B    C 
0 NaN 10.0 [20, 21, 22] 
1 1.0 35.0 [23, 24, 25] 
2 2.0 78.0   NaN

來源

2017-02-10 20:08:36 Shijo

這裏它會替換第一行的10個，我不想這樣做：我只是想改變NaN值。 –

它不會，只能取代南的 – Shijo

好吧，如果索引對應於缺失的行，你說得對sry –

你不得不使用一個迭代器作爲索引標記，以便在您的自定義列表與值替換您的NaN的：

import numpy as np 
import pandas as pd 

your_df = pd.DataFrame({'your_column': [0,1,2,np.nan,4,6,np.nan,np.nan,7,8,np.nan,9]}) # a df with 4 NaN's 
print your_df 

your_custom_list = [1,3,6,8] # custom list with 4 fillers 

your_column_vals = your_df['your_column'].values 

i_custom = 0 # starting index on your iterator for your custom list 
for i in range(len(your_column_vals)): 
    if np.isnan(your_column_vals[i]): 
     your_column_vals[i] = your_custom_list[i_custom] 
     i_custom += 1 # increase the index 

your_df['your_column'] = your_column_vals 

print your_df

輸出：

your_column 
0   0.0 
1   1.0 
2   2.0 
3   NaN 
4   4.0 
5   6.0 
6   NaN 
7   NaN 
8   7.0 
9   8.0 
10   NaN 
11   9.0 
    your_column 
0   0.0 
1   1.0 
2   2.0 
3   1.0 
4   4.0 
5   6.0 
6   3.0 
7   6.0 
8   7.0 
9   8.0 
10   8.0 
11   9.0

來源

2017-02-10 20:12:02

用列表中的值替換pandas.DataFrame的NaN值

回答

相關問題