2017-09-24 54 views
1

我知道有類似的問題已經被回答。但是,我似乎無法解決爲什麼沒有解決方案爲我工作。 我的樣本數據集:在熊貓數據框中平均每兩個連續的索引值(每2min)

TimeStamp  340   341   342   
    10:27:00  1.953036  2.110234  1.981548  
    10:28:00  1.973408  2.046361  1.806923   
    10:29:00  0.000000  0.000000  0.014881   
    10:30:00  2.567976  3.169928  3.479591 

我想找到每兩分鐘每列的數據的平均值。雖然df.groupby承諾一個整潔的解決方案,但它使我的TimeStamp列出於某種原因消失。非常感謝幫助。

預期輸出:

TimeStamp  340   341   342  
10:27:30  1.963222  2.078298  1.894235    
10:29:30  1.283988  1.584964  1.747236 

嘗試代碼:

import pandas as pd 
    import numpy as np 

    path = '/Users/username/Desktop/Model/' 
    file1 = 'filename.csv' 

    df = pd.read_csv(path + file1, skipinitialspace = True) 

    df['TimeStamp'] = pd.to_timedelta(df['TimeStamp']) 
    df['TimeStamp'] = df['TimeStamp'].dt.floor('min') 
    df.set_index('TimeStamp') 
    rowF = len(df['TimeStamp']) 

    # Average every two min 
    newdf = df.groupby(np.arange(len(df.index))//2).mean() 
    print(newdf)   

回答

0

設置時間爲指標:

df.set_index(pd.to_timedelta(df.TimeStamp), inplace=True) 

然後用resample和每兩分鐘彙總:

df.resample("2min").mean().reset_index() 

# TimeStamp  340  341  342 
#0 10:27:00 1.963222 2.078298 1.894235 
#1 10:29:00 1.283988 1.584964 1.747236 
#2 10:31:00  NaN  NaN  NaN 

刪除最後一個觀察iloc

df.resample("2min").mean().reset_index().iloc[:-1] 

# TimeStamp  340  341  342 
#0 10:27:00 1.963222 2.078298 1.894235 
#1 10:29:00 1.283988 1.584964 1.747236 

如果你喜歡的TimeStamp通過30秒轉移:

(df.resample("2min").mean().reset_index() 
    .assign(TimeStamp = lambda x: x.TimeStamp + pd.Timedelta('30 seconds')) 
    .iloc[:-1]) 

# TimeStamp  340  341  342 
#0 10:27:30 1.963222 2.078298 1.894235 
#1 10:29:30 1.283988 1.584964 1.747236 
相關問題