的Python：第三列的和值，如果兩列具有相同的值

我有以下的數據幀df的Python：第三列的和值，如果兩列具有相同的值

df 
    a b i 
0 1.0 3.0 2.0 
1 1.0 3.0 3.0 
2 1.0 3.0 1.0 
3 1.0 3.0 3.0 
4 1.0 3.0 7.0 
5 1.0 3.0 8.0 
6 1.0 4.0 4.0 
7 1.0 4.0 0.0 
8 1.0 3.0 2.0 
9 1.0 3.0 1.0 
10 1.0 3.0 2.0

我要讓總和超過i爲同一對夫婦a和b，所以

df2 
    a b i 
0 1.0 3.0 31.0 
1 1.0 4.0 4.0 
2 1.0 3.0 0.0 

df2 = df2.groupby(['a', 'b']).sum(['i']).reset_index()

來源

2016-11-29 emax

我想你需要添加列i到groupby末，那麼它是使用了sum功能：

df2 = df2.groupby(['a', 'b'])['i'].sum().reset_index() 
print (df2) 
    a b  i 
0 1.0 3.0 29.0 
1 1.0 4.0 4.0

或者添加參數as_index=False退貨df：

df2 = df2.groupby(['a', 'b'], as_index=False)['i'].sum() 
print (df2) 
    a b  i 
0 1.0 3.0 29.0 
1 1.0 4.0 4.0

如果需要另一種解決方案是使用Series：

df2 = df2.i.groupby([df2.a,df2.b]).sum().reset_index() 
print (df2) 
    a b  i 
0 1.0 3.0 29.0 
1 1.0 4.0 4.0

編輯：

如果按位置分組的需求差異df使用groupbySeriesg與aggregate：

ab = df2[['a','b']] 

#compare shifted values  
print (ab.ne(ab.shift())) 
     a  b 
0 True True 
1 False False 
2 False False 
3 False False 
4 False False 
5 False False 
6 False True 
7 False False 
8 False True 
9 False False 
10 False False

#check at least one True 
print (ab.ne(ab.shift()).any(1)) 
0  True 
1  False 
2  False 
3  False 
4  False 
5  False 
6  True 
7  False 
8  True 
9  False 
10 False 
dtype: bool

#use cumulative sum of boolean Series 
g = ab.ne(ab.shift()).any(1).cumsum() 
print (g) 
0  1 
1  1 
2  1 
3  1 
4  1 
5  1 
6  2 
7  2 
8  3 
9  3 
10 3 
dtype: int32

print (df2.groupby(g).agg(dict(a='first', b='first', i='sum'))) 
    a b  i 
1 1.0 3.0 24.0 
2 1.0 4.0 4.0 
3 1.0 3.0 5.0

來源

2016-11-29 22:02:48 jezrael

要比較，看是否事先a, b組合發生了變化，並做了cumsum建立一組陣列

ab = df[['a', 'b']].apply(tuple, 1) 

df.groupby(ab.ne(ab.shift()).cumsum()) \ 
    .agg(dict(a='last', b='last', i='sum')) \ 
    .reindex_axis(df.columns.tolist(), 1)

進行分解

ab = df[['a', 'b']].apply(tuple, 1)
- 我弄了一系列的元組的，所以我可以看到，如果組合改變
ab.ne(ab.shift())
- 檢查，如果元組不一樣，以前的元組
ab.ne(ab.shift()).cumsum()
- 如果不是，那麼True值添加到cumumlative總和。這將創建一個方便的分組每個contigous組相同的雙a和b
.agg(dict(a='last', b='last', i='sum'))
- 只是規定如何處理各組每列做。得到a和b的最後一個值，這是很好的，因爲我知道它在整個組中都是一樣的。求和列i
.reindex_axis(df.columns.tolist(), 1)
- 讓我列的順序是

方式

來源

2016-11-29 22:08:56 piRSquared

的Python：第三列的和值，如果兩列具有相同的值

回答

相關問題