2017-09-21 38 views
1

我有一個數據集,看起來是這樣的:與大熊貓據幀滑動窗口的數據

df = DataFrame(dict(month = [1,2,3,4,5,6], a = [2,4,2,4,2,4], b = [3,5,6,3,4,6])) 

enter image description here

我要的是,可以採取一個窗口大小作爲輸入,並給我類似的功能這樣的:

功能:def make_sliding_df(data, size)

  1. 如果我做make_sliding_df(df, 1)輸出應該是一個數據幀這樣的:

enter image description here

  • 如果我make_sliding_df(df, 2)輸出應爲一個數據幀這樣的:
  • enter image description here

    我已經嘗試了一堆東西,但目前爲止還沒有人幫助過我,所以我將不勝感激。(我已經檢查了幾個其他類似的問題,但沒有任何幫助)

    回答

    2

    下面是使用shiftapplymapreduce

    In [2007]: def make_sliding(df, N): 
         ...:  dfs = [df.shift(-i).applymap(lambda x: [x]) for i in range(0, N+1)] 
         ...:  return reduce(lambda x, y: x.add(y), dfs) 
         ...: 
    
    In [2008]: make_sliding(df, 1) 
    Out[2008]: 
          a   b  month 
    0 [2, 4.0] [3, 5.0] [1, 2.0] 
    1 [4, 2.0] [5, 6.0] [2, 3.0] 
    2 [2, 4.0] [6, 3.0] [3, 4.0] 
    3 [4, 2.0] [3, 4.0] [4, 5.0] 
    4 [2, 4.0] [4, 6.0] [5, 6.0] 
    5 [4, nan] [6, nan] [6, nan] 
    
    In [2009]: make_sliding(df, 2) 
    Out[2009]: 
           a    b   month 
    0 [2, 4.0, 2.0] [3, 5.0, 6.0] [1, 2.0, 3.0] 
    1 [4, 2.0, 4.0] [5, 6.0, 3.0] [2, 3.0, 4.0] 
    2 [2, 4.0, 2.0] [6, 3.0, 4.0] [3, 4.0, 5.0] 
    3 [4, 2.0, 4.0] [3, 4.0, 6.0] [4, 5.0, 6.0] 
    4 [2, 4.0, nan] [4, 6.0, nan] [5, 6.0, nan] 
    5 [4, nan, nan] [6, nan, nan] [6, nan, nan] 
    
    0

    這通過使用numpy的一種方式,這可能看起來很醜陋,但它是我第一次嘗試用numpy ...

    def make_sliding_df(df,step=1,width=2): 
        l=[] 
        for x in df.columns: 
         a=df[x] 
         a=np.array(a) 
         b=np.append(a,[np.nan]*(width-1)) 
         l.append((b[(np.arange(width)[None, :] + step*np.arange(len(a))[:, None])]).tolist()) 
        newdf=pd.DataFrame(data=l).T 
        newdf.columns=df.columns 
        return(newdf) 
    
    make_sliding_df(df,step=1,width=2) 
    Out[157]: 
          a   b  month 
    0 [2.0, 4.0] [3.0, 5.0] [1.0, 2.0] 
    1 [4.0, 2.0] [5.0, 6.0] [2.0, 3.0] 
    2 [2.0, 4.0] [6.0, 3.0] [3.0, 4.0] 
    3 [4.0, 2.0] [3.0, 4.0] [4.0, 5.0] 
    4 [2.0, 4.0] [4.0, 6.0] [5.0, 6.0] 
    5 [4.0, nan] [6.0, nan] [6.0, nan] 
    
    make_sliding_df(df,step=1,width=3) 
    Out[158]: 
           a    b   month 
    0 [2.0, 4.0, 2.0] [3.0, 5.0, 6.0] [1.0, 2.0, 3.0] 
    1 [4.0, 2.0, 4.0] [5.0, 6.0, 3.0] [2.0, 3.0, 4.0] 
    2 [2.0, 4.0, 2.0] [6.0, 3.0, 4.0] [3.0, 4.0, 5.0] 
    3 [4.0, 2.0, 4.0] [3.0, 4.0, 6.0] [4.0, 5.0, 6.0] 
    4 [2.0, 4.0, nan] [4.0, 6.0, nan] [5.0, 6.0, nan] 
    5 [4.0, nan, nan] [6.0, nan, nan] [6.0, nan, nan]