2017-07-20 174 views
2

我想請求您的幫助,對熊貓數據框執行操作。Python組以時間間隔

我的初始數據幀如下:

enter image description here

,我想重塑它在30秒的間隔,並計算平均爲每個組。

我用下面的:

df_a['avgValue']= df_a['value'].groupby([df_a['id_A'],df_a['course'], pd.TimeGrouper(freq='30S')]).transform(np.mean) 

,我得到以下幾點:

enter image description here

但TS_A不受30秒分組。例如,第一行和第二行應該在一個被合併,獲得預期的結果:

id_A    ts_A course weight avgValue 
id1 2017-04-27 01:35:30 cotton 3.5 150.000 
id1 2017-04-27 01:36:00 cotton 3.5 416.000 
... 

我的問題是:我應該如何改變上面的代碼,以獲得預期的結果?

很多預先感謝。 最好的問候, 卡羅

回答

2

我想你需要(假設ts_A設置爲DatetimeIndexGroupBy.mean,省略transform功能:

#if not datetimeindex 
#df['ts_A'] = pd.to_datetime(df['ts_A']) 
#df = df.set_index('ts_A') 


df = df_a['value'].groupby([df_a['id_A'], 
          df_a['course'], 
          df_a['weight'], 
          pd.TimeGrouper(freq='30S')]).mean().reset_index() 

或者:

df = df_a.groupby(['id_A','course','weight', 
        pd.TimeGrouper(freq='30S')])['value'].mean().reset_index() 
print (df) 
    id_A  course weight    ts_A  value 
0 id1  cotton  3.5 2017-04-27 01:35:30 150.000000 
1 id1  cotton  3.5 2017-04-27 01:36:00 416.666667 
2 id1  cotton  3.5 2017-04-27 01:36:30 700.000000 
3 id1  cotton  3.5 2017-04-27 01:37:00 950.000000 
4 id2 cotton blue  5.0 2017-04-27 02:35:30 150.000000 
5 id2 cotton blue  5.0 2017-04-27 02:36:00 450.000000 
6 id2 cotton blue  5.0 2017-04-27 02:36:30 520.666667 
7 id2 cotton blue  5.0 2017-04-27 02:37:00 610.000000 

解決方案與resample

df = df_a.groupby(['id_A','course','weight'])['value'].resample('30S').mean().reset_index() 
print (df) 
    id_A  course weight    ts_A  value 
0 id1  cotton  3.5 2017-04-27 01:35:30 150.000000 
1 id1  cotton  3.5 2017-04-27 01:36:00 416.666667 
2 id1  cotton  3.5 2017-04-27 01:36:30 700.000000 
3 id1  cotton  3.5 2017-04-27 01:37:00 950.000000 
4 id2 cotton blue  5.0 2017-04-27 02:35:30 150.000000 
5 id2 cotton blue  5.0 2017-04-27 02:36:00 450.000000 
6 id2 cotton blue  5.0 2017-04-27 02:36:30 520.666667 
7 id2 cotton blue  5.0 2017-04-27 02:37:00 610.000000 

設置:

所有的
d = {'weight': {0: 3.5, 1: 3.5, 2: 3.5, 3: 3.5, 4: 3.5, 5: 3.5, 6: 3.5, 7: 3.5, 8: 3.5, 9: 3.5, 10: 5.0, 11: 5.0, 12: 5.0, 13: 5.0, 14: 5.0, 15: 5.0, 16: 5.0, 17: 5.0, 18: 5.0, 19: 5.0}, 'value': {0: 100, 1: 200, 2: 350, 3: 400, 4: 500, 5: 600, 6: 700, 7: 800, 8: 900, 9: 1000, 10: 100, 11: 200, 12: 450, 13: 300, 14: 600, 15: 500, 16: 522, 17: 540, 18: 320, 19: 900}, 'ts_A': {0: '2017-04-27 01:35:40', 1: '2017-04-27 01:35:50', 2: '2017-04-27 01:36:00', 3: '2017-04-27 01:36:10', 4: '2017-04-27 01:36:20', 5: '2017-04-27 01:36:30', 6: '2017-04-27 01:36:40', 7: '2017-04-27 01:36:50', 8: '2017-04-27 01:37:00', 9: '2017-04-27 01:37:10', 10: '2017-04-27 02:35:40', 11: '2017-04-27 02:35:50', 12: '2017-04-27 02:36:00', 13: '2017-04-27 02:36:10', 14: '2017-04-27 02:36:20', 15: '2017-04-27 02:36:30', 16: '2017-04-27 02:36:40', 17: '2017-04-27 02:36:50', 18: '2017-04-27 02:37:00', 19: '2017-04-27 02:37:10'}, 'course': {0: 'cotton', 1: 'cotton', 2: 'cotton', 3: 'cotton', 4: 'cotton', 5: 'cotton', 6: 'cotton', 7: 'cotton', 8: 'cotton', 9: 'cotton', 10: 'cotton blue', 11: 'cotton blue', 12: 'cotton blue', 13: 'cotton blue', 14: 'cotton blue', 15: 'cotton blue', 16: 'cotton blue', 17: 'cotton blue', 18: 'cotton blue', 19: 'cotton blue'}, 'id_A': {0: 'id1', 1: 'id1', 2: 'id1', 3: 'id1', 4: 'id1', 5: 'id1', 6: 'id1', 7: 'id1', 8: 'id1', 9: 'id1', 10: 'id2', 11: 'id2', 12: 'id2', 13: 'id2', 14: 'id2', 15: 'id2', 16: 'id2', 17: 'id2', 18: 'id2', 19: 'id2'}} 
df_a = pd.DataFrame(d) 
df_a['ts_A'] = pd.to_datetime(df_a['ts_A']) 
df_a = df_a.set_index('ts_A') 
print (df_a) 
          course id_A value weight 
ts_A             
2017-04-27 01:35:40  cotton id1 100  3.5 
2017-04-27 01:35:50  cotton id1 200  3.5 
2017-04-27 01:36:00  cotton id1 350  3.5 
2017-04-27 01:36:10  cotton id1 400  3.5 
2017-04-27 01:36:20  cotton id1 500  3.5 
2017-04-27 01:36:30  cotton id1 600  3.5 
2017-04-27 01:36:40  cotton id1 700  3.5 
2017-04-27 01:36:50  cotton id1 800  3.5 
2017-04-27 01:37:00  cotton id1 900  3.5 
2017-04-27 01:37:10  cotton id1 1000  3.5 
2017-04-27 02:35:40 cotton blue id2 100  5.0 
2017-04-27 02:35:50 cotton blue id2 200  5.0 
2017-04-27 02:36:00 cotton blue id2 450  5.0 
2017-04-27 02:36:10 cotton blue id2 300  5.0 
2017-04-27 02:36:20 cotton blue id2 600  5.0 
2017-04-27 02:36:30 cotton blue id2 500  5.0 
2017-04-27 02:36:40 cotton blue id2 522  5.0 
2017-04-27 02:36:50 cotton blue id2 540  5.0 
2017-04-27 02:37:00 cotton blue id2 320  5.0 
2017-04-27 02:37:10 cotton blue id2 900  5.0 
+0

非常感謝您的回答。但是,我怎樣才能給包含avg的新創建列命名呢? –

+0

我正在處理樣本,但它不容易使用您的數據,因爲圖片。 – jezrael

+0

我明白了。很多預先感謝它。 –

2

首先,你應該設置日期時間列的索引,如TimeGrouper上的索引操作。

df = df_a.set_index('ts_A') 

然後通過操作計算調整組。最好首先對你感興趣的列進行分組,然後計算目標變量的平均值。

results = df.groupby(['id_A','course','weight', pd.TimeGrouper(freq='30S')])['value'].mean()