使用Pandas group連接來自多行的字符串byby

我想根據Pandas中的groupedby合併數據框中的幾個字符串。使用Pandas group連接來自多行的字符串byby

這是我到目前爲止的代碼：

import pandas as pd 
from io import StringIO 

data = StringIO(""" 
"name1","hej","2014-11-01" 
"name1","du","2014-11-02" 
"name1","aj","2014-12-01" 
"name1","oj","2014-12-02" 
"name2","fin","2014-11-01" 
"name2","katt","2014-11-02" 
"name2","mycket","2014-12-01" 
"name2","lite","2014-12-01" 
""") 

# load string as stream into dataframe 
df = pd.read_csv(data,header=0, names=["name","text","date"],parse_dates=[2]) 

# add column with month 
df["month"] = df["date"].apply(lambda x: x.month)

我想最終的結果看起來是這樣：

enter image description here

我不明白我怎麼可以使用GROUPBY和應用一些在「文本」列中串聯字符串。任何幫助感謝！

來源

2014-12-04 mattiasostmar

可以GROUPBY的'name'和'month'列，然後調用transform將返回對齊到原來的DF數據和應用拉姆達我們join文字條目：

In [119]: 

df['text'] = df[['name','text','month']].groupby(['name','month'])['text'].transform(lambda x: ','.join(x)) 
df[['name','text','month']].drop_duplicates() 
Out[119]: 
    name   text month 
0 name1  hej,du  11 
2 name1  aj,oj  12 
4 name2  fin,katt  11 
6 name2 mycket,lite  12

我通過傳遞子原來的DF感興趣df[['name','text','month']]列在這裏，然後列表調用drop_duplicates

編輯其實我可以叫apply然後reset_index：

In [124]: 

df.groupby(['name','month'])['text'].apply(lambda x: ','.join(x)).reset_index() 

Out[124]: 
    name month   text 
0 name1  11  hej,du 
1 name1  12  aj,oj 
2 name2  11  fin,katt 
3 name2  12 mycket,lite

更新

的lambda沒有必要在這裏：

In[38]: 
df.groupby(['name','month'])['text'].apply(','.join).reset_index() 

Out[38]: 
    name month   text 
0 name1  11   du 
1 name1  12  aj,oj 
2 name2  11  fin,katt 
3 name2  12 mycket,lite

來源

2014-12-04 15:54:19 EdChum

通過EdChum答案爲您提供了很大的靈活性，但如果你只是想concateate字符串成列表對象的列還可以：

output_series = df.groupby(['name','month'])['text'].apply(list)

來源

2017-08-28 19:18:24

請注意，這隻適用於一列一次。 – ybull 2017-10-18 19:27:23

使用Pandas group連接來自多行的字符串byby

回答

相關問題