Python熊貓數據幀插入缺失數據

我有一個如下的數據集。我們只有一個月的最後一天的數據，我試圖插入其餘部分，這是否正確？Python熊貓數據幀插入缺失數據

Date Australia China 
2011-01-01 NaN NaN 
2011-01-02 NaN NaN 
-   -  - 
-   -  - 
2011-01-31 4.75 5.81 
2011-02-01 NaN NaN 
2011-02-02 NaN NaN 
-   -  - 
-   -  - 
2011-02-28 4.75 5.81 
2011-03-01 NaN NaN 
2011-03-02 NaN NaN 
-   -  - 
-   -  - 
2011-03-31 4.75 6.06 
2011-04-01 NaN NaN 
2011-04-02 NaN NaN 
-   -  - 
-   -  - 
2011-04-30 4.75 6.06

對於插值這個數據幀尋找失蹤NaN值我使用下面的代碼

import pandas as pd 
df = pd.read_csv("data.csv", index_col="Date") 
df.index = pd.DatetimeIndex(df.index) 
df.interpolate(method='linear', axis=0).ffill().bfill()

但我得到一個錯誤「類型錯誤：無法與所有NaN的插值。」

這裏可能有什麼問題，我該如何解決這個問題？

謝謝。

來源

2016-01-09 Unnikrishnan

該錯誤是不言自明的。你可以嘗試按照這個http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.dropna.html – station

刪除NaN嗨，感謝您的幫助，而不是NaN我可以填寫什麼那些行？ – Unnikrishnan

@Unnikrishnan我想，已經給出了很好的答案。您的數據非常稀少，因此您可能需要質疑實際插入大量數據是否是個好主意。你有多確定這些數值會有點正確？ –

您可以嘗試轉換dataframe到float由astype：

import pandas as pd 

df = pd.read_csv("data.csv", index_col=['Date'], parse_dates=['Date']) 

print df 

      Australia China 
Date       
2011-01-31  4.75 5.81 
2011-02-28  4.75 5.81 
2011-03-31  4.75 6.06 
2011-04-30  4.75 6.06 

df = df.reindex(pd.date_range("2011-01-01", "2011-10-31"), fill_value="NaN") 

#convert to float 
df = df.astype(float) 

df = df.interpolate(method='linear', axis=0).ffill().bfill()

print df 

      Australia China 
2011-01-01  4.75 5.81 
2011-01-02  4.75 5.81 
2011-01-03  4.75 5.81 
2011-01-04  4.75 5.81 
2011-01-05  4.75 5.81 
2011-01-06  4.75 5.81 
2011-01-07  4.75 5.81 
2011-01-08  4.75 5.81 
2011-01-09  4.75 5.81 
2011-01-10  4.75 5.81 
2011-01-11  4.75 5.81 
2011-01-12  4.75 5.81 
2011-01-13  4.75 5.81 
2011-01-14  4.75 5.81 
2011-01-15  4.75 5.81 
2011-01-16  4.75 5.81 
2011-01-17  4.75 5.81 
2011-01-18  4.75 5.81 
2011-01-19  4.75 5.81 
2011-01-20  4.75 5.81 
2011-01-21  4.75 5.81 
2011-01-22  4.75 5.81 
2011-01-23  4.75 5.81 
2011-01-24  4.75 5.81 
2011-01-25  4.75 5.81 
2011-01-26  4.75 5.81 
2011-01-27  4.75 5.81 
2011-01-28  4.75 5.81 
2011-01-29  4.75 5.81 
2011-01-30  4.75 5.81 
...    ... ... 
2011-10-02  4.75 6.06 
2011-10-03  4.75 6.06 
2011-10-04  4.75 6.06 
2011-10-05  4.75 6.06 
2011-10-06  4.75 6.06 
2011-10-07  4.75 6.06 
2011-10-08  4.75 6.06 
2011-10-09  4.75 6.06 
2011-10-10  4.75 6.06 
2011-10-11  4.75 6.06 
2011-10-12  4.75 6.06 
2011-10-13  4.75 6.06 
2011-10-14  4.75 6.06 
2011-10-15  4.75 6.06 
2011-10-16  4.75 6.06 
2011-10-17  4.75 6.06 
2011-10-18  4.75 6.06 
2011-10-19  4.75 6.06 
2011-10-20  4.75 6.06 
2011-10-21  4.75 6.06 
2011-10-22  4.75 6.06 
2011-10-23  4.75 6.06 
2011-10-24  4.75 6.06 
2011-10-25  4.75 6.06 
2011-10-26  4.75 6.06 
2011-10-27  4.75 6.06 
2011-10-28  4.75 6.06 
2011-10-29  4.75 6.06 
2011-10-30  4.75 6.06 
2011-10-31  4.75 6.06 

[304 rows x 2 columns]

而且你可以省略ffill()，因爲NaN僅在dataframe第一行：

df = df.interpolate(method='linear', axis=0).ffill().bfill()

至：

df = df.interpolate(method='linear', axis=0).bfill()

來源

2016-01-09 14:39:44 jezrael

非常感謝，它按預期工作！ – Unnikrishnan

在插值之前，您可以嘗試從數據集中刪除NaN。

import pandas as pd 
df = pd.read_csv("data.csv", index_col="Date") 
df = df.dropna() 
df.index = pd.DatetimeIndex(df.index) 
df.interpolate(method='linear', axis=0).ffill().bfill()

來源

2016-01-09 11:54:58 station

這些NaN行不在CSV中我已經使用代碼df = df.reindex（pd.date_range（「2011-01-01」，「2011-10-31」），fill_value = np.nan）添加它。否則我如何填充這些行？ – Unnikrishnan

Python熊貓數據幀插入缺失數據

回答

相關問題