熊貓：變換行到基於狀態

單個列我有以下大熊貓數據幀命名matches：熊貓：變換行到基於狀態

id | name | age 
1 | a  | 19 
1 | b  | 25 
2 | c  | 19 
2 | d  | 22

我使用groupby + count()，如果某一列（age）的值滿足條件（ x < 21）。結果被寫入到一個新的列（new_col）：

matches['new_col'] = matches.groupby(['id'])['age'].transform(lambda x: x[x < 21].count())

然後將數據幀是這樣的：

id | name | age | new_col 
1 | a  | 19 | 1 
1 | b  | 25 | 1 
2 | c  | 19 | 2 
2 | d  | 18 | 2

現在我想將結果輸出更可讀的方式，也就是說，應滿足條件（年齡爲<21）的每行的name列應寫入新列，例如result。

我希望這樣的事情（但是，也有可能通過其他方式來實現這一目標甚至做THI已經在第一步中，在那裏我加new_col）：

id | name | age | new_col | result 
1 | a  | 19 | 1  | a 
1 | b  | 25 | 1  | a 
2 | c  | 19 | 2  | c,d 
2 | d  | 18 | 2  | c,d

最後一步（添加result專欄）是我現在堅持的地方。

來源

2017-01-14 beta

我現在是這樣的：groupBy + apply +應用的功能，增加了新列：

matches = matches.groupby(['id']).apply(concat)

CONCAT是：

def concat(group): 
    group['result'] = "{%s}" % ', '.join(group['name'][group['age'] < 21]) 
    return group

任何其他/更好的解決方案？通過boolean indexing

來源

2017-01-14 09:35:30 beta

第一過濾器行，然後aggregate，最後join到原始：

matches1 = matches[matches.age < 21] 
          .groupby(['id'])['name'].agg({'result':', '.join, 'new_col': len}) 
print (matches1) 
    new_col result 
id     
1   1  a 
2   2 c, d 

print (matches.join(matches1, on='id')) 
    id name age new_col result 
0 1 a 19  1  a 
1 1 b 25  1  a 
2 2 c 19  2 c, d 
3 2 d 18  2 c, d

與doubletransform另一種解決方案，但是首先需要sort_values下次使用ffill值，其是>=21的：

matches = matches.sort_values(['id','age']) 
g = matches[matches.age < 21].groupby(['id'])['name'] 
matches['new_col'] = g.transform(len) 
matches['result'] = g.transform(', '.join) 
matches[['new_col','result']] = matches[['new_col','result']].ffill() 

print (matches) 
    id name age new_col result 
0 1 a 19  1  a 
1 1 b 25  1  a 
3 2 d 18  2 d, c 
2 2 c 19  2 d, c

需要更好的解釋sorting有點改變df：

print (matches) 
    id name age 
0 1 a 25 > first value is filter out by condition 
1 1 b 12 
2 2 c 19 
3 2 d 18 

matches = matches.sort_values(['id','age']) 
g = matches[matches.age < 21].groupby(['id'])['name'] 
matches['new_col'] = g.transform(len) 
matches['result'] = g.transform(', '.join) 
matches[['new_col','result']] = matches[['new_col','result']].ffill() 

print (matches) 
    id name age new_col result 
1 1 b 12  1  b 
0 1 a 25  1  b 
3 2 d 18  2 d, c 
2 2 c 19  2 d, c 

print (matches.sort_index()) 
    id name age new_col result 
0 1 a 25  1  b 
1 1 b 12  1  b 
2 2 c 19  2 d, c 
3 2 d 18  2 d, c

來源

2017-01-14 09:57:16 jezrael

熊貓：變換行到基於狀態

回答

相關問題