2014-12-04 59 views
12

我想根據Pandas中的groupedby合併數據框中的幾個字符串。使用Pandas group連接來自多行的字符串byby

這是我到目前爲止的代碼:

import pandas as pd 
from io import StringIO 

data = StringIO(""" 
"name1","hej","2014-11-01" 
"name1","du","2014-11-02" 
"name1","aj","2014-12-01" 
"name1","oj","2014-12-02" 
"name2","fin","2014-11-01" 
"name2","katt","2014-11-02" 
"name2","mycket","2014-12-01" 
"name2","lite","2014-12-01" 
""") 

# load string as stream into dataframe 
df = pd.read_csv(data,header=0, names=["name","text","date"],parse_dates=[2]) 

# add column with month 
df["month"] = df["date"].apply(lambda x: x.month) 

我想最終的結果看起來是這樣:

enter image description here

我不明白我怎麼可以使用GROUPBY和應用一些在「文本」列中串聯字符串。任何幫助感謝!

回答

18

可以GROUPBY的'name''month'列,然後調用transform將返回對齊到原來的DF數據和應用拉姆達我們join文字條目:

In [119]: 

df['text'] = df[['name','text','month']].groupby(['name','month'])['text'].transform(lambda x: ','.join(x)) 
df[['name','text','month']].drop_duplicates() 
Out[119]: 
    name   text month 
0 name1  hej,du  11 
2 name1  aj,oj  12 
4 name2  fin,katt  11 
6 name2 mycket,lite  12 

我通過傳遞子原來的DF感興趣df[['name','text','month']]列在這裏,然後列表調用drop_duplicates

編輯其實我可以叫apply然後reset_index

In [124]: 

df.groupby(['name','month'])['text'].apply(lambda x: ','.join(x)).reset_index() 

Out[124]: 
    name month   text 
0 name1  11  hej,du 
1 name1  12  aj,oj 
2 name2  11  fin,katt 
3 name2  12 mycket,lite 

更新

lambda沒有必要在這裏:

In[38]: 
df.groupby(['name','month'])['text'].apply(','.join).reset_index() 

Out[38]: 
    name month   text 
0 name1  11   du 
1 name1  12  aj,oj 
2 name2  11  fin,katt 
3 name2  12 mycket,lite 
2

通過EdChum答案爲您提供了很大的靈活性,但如果你只是想concateate字符串成列表對象的列還可以:

output_series = df.groupby(['name','month'])['text'].apply(list)

+0

請注意,這隻適用於一列一次。 – ybull 2017-10-18 19:27:23