Python的大熊貓：墊行缺少/跳過日期

，我有以下的數據幀：Python的大熊貓：墊行缺少/跳過日期

date  my_count 
-------------------------- 
2017-01-01   6 
2017-01-04   5 
2017-01-05   3 
2017-01-08   8

我想墊跳過的日期my_count = 0，所以填補數據幀的樣子：

date  my_count 
-------------------------- 
2017-01-01   6 
2017-01-02   0 
2017-01-03   0 
2017-01-04   5 
2017-01-05   3 
2017-01-06   0 
2017-01-07   0 
2017-01-08   8

除了逐行檢查數據幀之外，有沒有更好的方法來做到這一點？謝謝！

來源

2017-07-04 Edamame

第一種選擇resample，

df['date'] = pd.to_datetime(df['date']) 
df = df.set_index('date') 

print(df.resample('D').sum().fillna(0).reset_index()) 

     date my_count 
0 2017-01-01  6.0 
1 2017-01-02  0.0 
2 2017-01-03  0.0 
3 2017-01-04  5.0 
4 2017-01-05  3.0 
5 2017-01-06  0.0 
6 2017-01-07  0.0 
7 2017-01-08  8.0

第二個選項reindex通過date_range，

df['date'] = pd.to_datetime(df['date']) 
df = df.set_index('date') 

print(df.reindex(pd.date_range('2017-01-01', '2017-01-08')).fillna(0)) 

      my_count 
2017-01-01  6.0 
2017-01-02  0.0 
2017-01-03  0.0 
2017-01-04  5.0 
2017-01-05  3.0 
2017-01-06  0.0 
2017-01-07  0.0 
2017-01-08  8.0

來源

2017-07-04 23:39:04 su79eu7k

'reindex'有一個'fill_value'參數。如果你使用它，你永遠不會獲得'nan'，它不會被拋出。 'df.reindex（pd.date_range（'2017-01-01'，'2017-01-08'），fill_value = 0）' – piRSquared

如果DatetimeIndex值是唯一使用：

您可以通過最小或最大使用asfreq或reindex值爲index或首先和最後（如果DatetimeIndex排序）：

df['date'] = pd.to_datetime(df['date']) 
df = df.set_index('date') 

print(df.asfreq('D', fill_value=0).reset_index()) 
     date my_count 
0 2017-01-01   6 
1 2017-01-02   0 
2 2017-01-03   0 
3 2017-01-04   5 
4 2017-01-05   3 
5 2017-01-06   0 
6 2017-01-07   0 
7 2017-01-08   8 

rng = pd.date_range(df.index.min(), df.index.max()) 
#alternative 
#rng = pd.date_range(df.index[0], df.index[-1]) 
print(df.reindex(rng, fill_value=0).rename_axis('date').reset_index()) 
     date my_count 
0 2017-01-01   6 
1 2017-01-02   0 
2 2017-01-03   0 
3 2017-01-04   5 
4 2017-01-05   3 
5 2017-01-06   0 
6 2017-01-07   0 
7 2017-01-08   8

如果DatetimeIndex並不是唯一得到：

ValueError: cannot reindex from a duplicate axis

這時需要resample與像mean或groupby與Grouper和最後一些聚合函數由fillna更換NaN S：

print (df) 
     date my_count 
0 2017-01-01   4 <-duplicate date 
1 2017-01-01   6 <-duplicate date 
2 2017-01-04   5 
3 2017-01-05   3 
4 2017-01-08   8 

df['date'] = pd.to_datetime(df['date']) 

print(df.resample('D', on='date')['my_count'].mean().fillna(0).reset_index()) 
     date my_count 
0 2017-01-01  5.0 
1 2017-01-02  0.0 
2 2017-01-03  0.0 
3 2017-01-04  5.0 
4 2017-01-05  3.0 
5 2017-01-06  0.0 
6 2017-01-07  0.0 
7 2017-01-08  8.0 

df = df.set_index('date') 
print(df.groupby(pd.Grouper(freq='D'))['my_count'].mean().fillna(0).reset_index()) 
     date my_count 
0 2017-01-01  5.0 
1 2017-01-02  0.0 
2 2017-01-03  0.0 
3 2017-01-04  5.0 
4 2017-01-05  3.0 
5 2017-01-06  0.0 
6 2017-01-07  0.0 
7 2017-01-08  8.0

來源

2017-07-05 04:42:10 jezrael

謝謝！但我得到這個錯誤：asfreq（）有一個意想不到的關鍵字參數'fill_value'任何想法？ – Edamame

什麼是你的熊貓版本？在「熊貓20.0.2」中，我對它做出了完美的表現。 – jezrael

我明白了。我有熊貓0.19.1 ... – Edamame

Python的大熊貓：墊行缺少/跳過日期

回答

相關問題