0
我有一個數據框,它具有日期時間作爲索引和3列,id,收入和成本。比較相同數據框中的兩列,並從比較中計算統計信息
d = {'id' : ['4573', '4573', '4573', '958245','958245','958245'] \
,'revenue' : np.random.uniform(size=6),'cost' : np.random.uniform(size=6)}
e = ['2014-03-01','2014-04-01','2014-05-01','2014-05-01','2015-03-01','2015-02-01']
dateindex = [datetime.strptime(a, '%Y-%m-%d') for a in e]
df = pd.DataFrame(d)
df.index = dateindex
cost id revenue
2014-03-01 0.445597 4573 0.901713
2014-04-01 0.774029 4573 0.908302
2014-05-01 0.104274 4573 0.278444
2014-05-01 0.938426 958245 0.755022
2015-03-01 0.647886 958245 0.125072
2015-02-01 0.267773 958245 0.557496
我想對每個ID執行收入和成本之間的各種比較。
例如:
僞代碼:
If Revenue > Cost > 0
CountA = CountA + 1
Elif 0 < Revenue < Cost
CountB = CountB + 1
Elif Revenue > 0 > Cost
CountC = CountC + 1
Elif Revenue = 0 and Cost > 0
CountD = CountD + 1
對於情況AI以爲我可以這樣做:
df[['revenue']][df['id'] == '4573'] > df[['cost']][df['id'] == '4573']
但我得到:
ValueError: Can only compare identically-labeled DataFrame objects
有沒有更有效的方式來做我所做的事情nt做什麼?