2017-08-16 160 views
2

我創建一個數據幀變換由熊貓

import pandas as pd 

df1 = pd.DataFrame({  
"Name" : ["Alice", "Bob", "Mallory", "Mallory", "Bob" , "Mallory"] ,   
"City" : ["Seattle", "Seattle", "Portland", "Seattle", "Seattle",  
"Portland"] })   

df1.groupby(["City"])['Name'].transform(lambda x:  
','.join(x)).drop_duplicates()  
I want the output as  

Name City     
Alice,Bob,Mallory,Bob  Seattle   
Mallory,Mallory Portland   

but i am getting only   
Name   
Alice,Bob,Mallory,Bob     
Mallory,Mallory   

This is an example with small number of columns but in my actual problem i 
have too many columns so i cannot use   
df1['Name']= df1.groupby(['City'])['Name'].transform(lambda x:   
','.join(x))    
df1.groupby(['City','Name'], as_index=False)    
df1.drop_duplicates()   

,因爲每個專欄中,我不得不寫相同的代碼
有沒有辦法做到這一點,而無需編寫變換爲每列 獨立。

回答

2

1列聚集

我認爲你需要apply,.join,則變更單使用雙[[]]

df = df1.groupby(["City"])['Name'].apply(','.join).reset_index() 
df = df[['Name','City']] 
print (df) 
        Name  City 
0  Mallory,Mallory Portland 
1 Alice,Bob,Mallory,Bob Seattle 

因爲transform創建彙總值新列:

df1['new'] = df1.groupby("City")['Name'].transform(','.join) 
print (df1) 
     City  Name     new 
0 Seattle Alice Alice,Bob,Mallory,Bob 
1 Seattle  Bob Alice,Bob,Mallory,Bob 
2 Portland Mallory  Mallory,Mallory 
3 Seattle Mallory Alice,Bob,Mallory,Bob 
4 Seattle  Bob Alice,Bob,Mallory,Bob 
5 Portland Mallory  Mallory,Mallory 

2列和多個聚合

如果多個列需要agg與指定列[]或沒有指定爲參加所有的字符串列:

df1 = pd.DataFrame({  
"Name" : ["Alice", "Bob", "Mallory", "Mallory", "Bob" , "Mallory"] , 
"Name2": ["Alice1", "Bob1", "Mallory1", "Mallory1", "Bob1" , "Mallory1"],  
"City" : ["Seattle", "Seattle", "Portland", "Seattle", "Seattle",  
"Portland"] }) 
print (df1) 
     City  Name  Name2 
0 Seattle Alice Alice1 
1 Seattle  Bob  Bob1 
2 Portland Mallory Mallory1 
3 Seattle Mallory Mallory1 
4 Seattle  Bob  Bob1 
5 Portland Mallory Mallory1 

df = df = df1.groupby('City')['Name', 'Name2'].agg(','.join).reset_index() 
print (df) 
     City     Name      Name2 
0 Portland  Mallory,Mallory   Mallory1,Mallory1 
1 Seattle Alice,Bob,Mallory,Bob Alice1,Bob1,Mallory1,Bob1 

ANF如果需要彙總所有列:

df = df1.groupby('City').agg(','.join).reset_index() 
print (df) 
     City     Name      Name2 
0 Portland  Mallory,Mallory   Mallory1,Mallory1 
1 Seattle Alice,Bob,Mallory,Bob Alice1,Bob1,Mallory1,Bob1 

df1 = pd.DataFrame({  
"Name" : ["Alice", "Bob", "Mallory", "Mallory", "Bob" , "Mallory"] , 
"Name2": ["Alice1", "Bob1", "Mallory1", "Mallory1", "Bob1" , "Mallory1"],  
"City" : ["Seattle", "Seattle", "Portland", "Seattle", "Seattle", "Portland"], 
'Numbers':[1,5,4,3,2,1]}) 
print (df1) 
     City  Name  Name2 Numbers 
0 Seattle Alice Alice1  1 
1 Seattle  Bob  Bob1  5 
2 Portland Mallory Mallory1  4 
3 Seattle Mallory Mallory1  3 
4 Seattle  Bob  Bob1  2 
5 Portland Mallory Mallory1  1 


df = df1.groupby('City').agg({'Name': ','.join, 
           'Name2': ','.join, 
           'Numbers': 'max'}).reset_index() 
print (df) 
     City     Name      Name2 Numbers 
0 Portland  Mallory,Mallory   Mallory1,Mallory1  4 
1 Seattle Alice,Bob,Mallory,Bob Alice1,Bob1,Mallory1,Bob1  5 
+0

好感謝這是工作,還有一件事想我有數字多了一個欄,我不得不計算最大或最小與該列的上述操作,那麼我將如何在一個語句中添加兩個agg函數。 – vatsal

+0

查看編輯答案。 – jezrael

+1

非常感謝你:) – vatsal

1

你湊LD做

In [42]: df1.groupby('City')['Name'].agg(','.join).reset_index(name='Name') 
Out[42]: 
     City     Name 
0 Portland  Mallory,Mallory 
1 Seattle Alice,Bob,Mallory,Bob 

或者,

In [49]: df1.groupby('City', as_index=False).agg({'Name': ','.join}) 
Out[49]: 
     City     Name 
0 Portland  Mallory,Mallory 
1 Seattle Alice,Bob,Mallory,Bob 

對於多個聚合

df1.groupby('City', as_index=False).agg(
     {'Name': ','.join, 'Name2': ','.join, 'Number': 'max'}) 
+0

如果我有更多的列作爲Name2,那麼我將如何使用上述函數來獲得與字符串聚合相同的結果。 – vatsal

+0

檢查我的答案。 – jezrael