2013-10-01 18 views
16

我有時間的索引數據:熊貓 - Extend DataFrame的索引將新行的所有列設置爲NaN?

df2 = pd.DataFrame({ 'day': pd.Series([date(2012, 1, 1), date(2012, 1, 3)]), 'b' : pd.Series([0.22, 0.3]) }) 
df2 = df2.set_index('day') 
df2 
       b 
day    
2012-01-01 0.22 
2012-01-03 0.30 

什麼是延長這個數據幀,使其具有每天一排在2012年1月(說),其中所有列設置爲NaN(最好的辦法這裏只有b)我們沒有數據?

所以期望的結果將是:

   b 
day    
2012-01-01 0.22 
2012-01-02 NaN 
2012-01-03 0.30 
2012-01-04 NaN 
... 
2012-01-31 NaN 

非常感謝!

回答

3

按你們的要求

df3 = df2.asfreq('D') 
df3 

Out[16]: 
       b 
2012-01-01 0.22 
2012-01-02 NaN 
2012-01-03 0.30 

要回答你的第二部分,你可以重新取樣日新月異的頻率,而無需指定fill_method參數遺漏值將NaN填滿,我想不出一個更優雅的方式瞬間:

df3 = DataFrame({ 'day': Series([date(2012, 1, 4), date(2012, 1, 31)])}) 
df3.set_index('day',inplace=True) 
merged = df2.append(df3) 
merged = merged.asfreq('D') 
merged 


Out[46]: 
       b 
2012-01-01 0.22 
2012-01-02 NaN 
2012-01-03 0.30 
2012-01-04 NaN 
2012-01-05 NaN 
2012-01-06 NaN 
2012-01-07 NaN 
2012-01-08 NaN 
2012-01-09 NaN 
2012-01-10 NaN 
2012-01-11 NaN 
2012-01-12 NaN 
2012-01-13 NaN 
2012-01-14 NaN 
2012-01-15 NaN 
2012-01-16 NaN 
2012-01-17 NaN 
2012-01-18 NaN 
2012-01-19 NaN 
2012-01-20 NaN 
2012-01-21 NaN 
2012-01-22 NaN 
2012-01-23 NaN 
2012-01-24 NaN 
2012-01-25 NaN 
2012-01-26 NaN 
2012-01-27 NaN 
2012-01-28 NaN 
2012-01-29 NaN 
2012-01-30 NaN 
2012-01-31 NaN 

此構造第二時間序列,然後我們只是追加和之前調用asfreq('D')

+0

感謝 - 這是偉大的,以填補漏洞,但我怎麼能擴展到'2012-01-31' (說)。 – paul

+0

嗯。但是,如果我在原始時間系列中有多個孔/間隙,那麼這不再起作用。 – paul

+0

@paul是的我的答案在這方面有限,我想不出更通用的方法。如果可以的話,首先創建帶有所有期望值的DataFrame會更好一些,我會有一個解決方法,看看我能不能找到更好的東西 – EdChum

17

使用此:

ix = pd.DatetimeIndex(start=date(2012, 1, 1), end=date(2012, 1, 31), freq='D') 
df2.reindex(ix) 

其中給出:

   b 
2012-01-01 0.22 
2012-01-02 NaN 
2012-01-03 0.30 
2012-01-04 NaN 
2012-01-05 NaN 
[...] 
2012-01-29 NaN 
2012-01-30 NaN 
2012-01-31 NaN 
2

這裏的另一種選擇: 加上一個NaN記錄你想要的最後一天,然後重新取樣。這種重採樣方式將填補你缺失的日期。

起始幀:

import pandas as pd 
import numpy as np 
from datetime import date 

df2 = pd.DataFrame({ 'day': pd.Series([date(2012, 1, 1), date(2012, 1, 3)]), 'b' : pd.Series([0.22, 0.3]) }) 
df2= df2.set_index('day') 
df2 

Out: 
        b 
    day 
    2012-01-01 0.22 
    2012-01-03 0.30 

填充框架:

df2 = df2.set_value(date(2012,1,31),'b',np.float('nan')) 
df2.asfreq('D') 

Out: 
       b 
    day 
    2012-01-01 0.22 
    2012-01-02 NaN 
    2012-01-03 0.30 
    2012-01-04 NaN 
    2012-01-05 NaN 
    2012-01-06 NaN 
    2012-01-07 NaN 
    2012-01-08 NaN 
    2012-01-09 NaN 
    2012-01-10 NaN 
    2012-01-11 NaN 
    2012-01-12 NaN 
    2012-01-13 NaN 
    2012-01-14 NaN 
    2012-01-15 NaN 
    2012-01-16 NaN 
    2012-01-17 NaN 
    2012-01-18 NaN 
    2012-01-19 NaN 
    2012-01-20 NaN 
    2012-01-21 NaN 
    2012-01-22 NaN 
    2012-01-23 NaN 
    2012-01-24 NaN 
    2012-01-25 NaN 
    2012-01-26 NaN 
    2012-01-27 NaN 
    2012-01-28 NaN 
    2012-01-29 NaN 
    2012-01-30 NaN 
    2012-01-31 NaN