1
我有以下pandas.DataFrame
對象:優雅的groupby和熊貓更新?
offset ts op time
0 0.000000 2015-10-27 18:31:40.318 Decompress 2.953
1 0.000000 2015-10-27 18:31:40.318 DeserializeBond 0.015
32 0.000000 2015-10-27 18:31:40.318 Compress 17.135
33 0.000000 2015-10-27 18:31:40.318 BuildIndex 19.494
34 0.000000 2015-10-27 18:31:40.318 InsertIndex 0.625
35 0.000000 2015-10-27 18:31:40.318 Compress 16.970
36 0.000000 2015-10-27 18:31:40.318 BuildIndex 18.954
37 0.000000 2015-10-27 18:31:40.318 InsertIndex 0.047
38 0.000000 2015-10-27 18:31:40.318 Compress 16.017
39 0.000000 2015-10-27 18:31:40.318 BuildIndex 17.814
40 0.000000 2015-10-27 18:31:40.318 InsertIndex 0.047
77 4.960683 2015-10-27 18:36:37.959 Decompress 2.844
78 4.960683 2015-10-27 18:36:37.959 DeserializeBond 0.000
108 4.960683 2015-10-27 18:36:37.959 Compress 17.758
109 4.960683 2015-10-27 18:36:37.959 BuildIndex 19.742
110 4.960683 2015-10-27 18:36:37.959 InsertIndex 0.110
111 4.960683 2015-10-27 18:36:37.959 Compress 16.267
112 4.960683 2015-10-27 18:36:37.959 BuildIndex 18.111
113 4.960683 2015-10-27 18:36:37.959 InsertIndex 0.062
我想組由(offset, ts, op)
領域,並總結time
值:
df = df.groupby(['offset', 'ts', 'op']).sum()
到目前爲止好:
time
offset ts op
0.000000 2015-10-27 18:31:40.318 BuildIndex 56.262
Compress 50.122
Decompress 2.953
DeserializeBond 0.015
InsertIndex 0.719
4.960683 2015-10-27 18:36:37.959 BuildIndex 37.853
Compress 34.025
Decompress 2.844
DeserializeBond 0.000
InsertIndex 0.172
問題是,我必須從BuildIndex
減去Compress
- 內每組。 I was recommended使用DataFrame.xs()
,我想出了以下內容:
diff = df.xs("BuildIndex", level="op") - df.xs("Compress", level="op")
diff['op'] = 'BuildIndex'
diff = diff.reset_index().groupby(['offset', 'ts', 'op']).agg(lambda val: val)
df.update(diff)
它的工作,但我有一個強烈的感覺,必須有一個更優雅的解決問題的方法。
有人可以建議一個更好的方法來做到這一點?
這是偉大的!非常感謝你的幫助。事實證明,你可以將多級列作爲元組來處理,並且在取消堆棧之後,只需編寫:'df ['time','BuildIndex'] - = df ['time','Compress']'。現在我很高興:-) –