熊貓在字符串列上滾動總和

我使用Python3和pandas版本'0.19.2'。熊貓在字符串列上滾動總和

我有一個熊貓DF如下：

chat_id line 
1   'Hi.' 
1   'Hi, how are you?.' 
1   'I'm well, thanks.' 
2   'Is it going to rain?.' 
2   'No, I don't think so.'

我想組由「chat_id」，然後做一些像「線」滾動總和得到如下：

chat_id line      conversation 
1   'Hi.'     'Hi.' 
1   'Hi, how are you?.'  'Hi. Hi, how are you?.' 
1   'I'm well, thanks.'  'Hi. Hi, how are you?. I'm well, thanks.' 
2   'Is it going to rain?.' 'Is it going to rain?.' 
2   'No, I don't think so.' 'Is it going to rain?. No, I don't think so.'

我相信df.groupby（'chat_id'）['line']。cumsum（）只適用於數字列。

我也試圖df.groupby（由= [「chat_id」]，as_index =假）「行」]。應用（列表）來獲得完整的會話中的所有行的列表，但後來我無法弄清楚如何解開該列表以創建「滾動總和」式對話欄。

來源

2017-04-23 user3591836

有趣。如果您在Series上調用'cumsum'，但在groupby對象上調用時會引發錯誤。 – ayhan

對我的作品apply與Series.cumsum，如果需要添加分隔space：

df['new'] = df.groupby('chat_id')['line'].apply(lambda x: (x + ' ').cumsum().str.strip()) 
print (df) 
    chat_id     line           new 
0  1     Hi.           Hi. 
1  1  Hi, how are you?.      Hi. Hi, how are you?. 
2  1  I'm well, thanks.  Hi. Hi, how are you?. I'm well, thanks. 
3  2 Is it going to rain?.      Is it going to rain?. 
4  2 No, I don't think so. Is it going to rain?. No, I don't think so.

df['line'] = df['line'].str.strip("'") 
df['new'] = df.groupby('chat_id')['line'].apply(lambda x: "'" + (x + ' ').cumsum().str.strip() + "'") 
print (df) 
    chat_id     line \ 
0  1     Hi. 
1  1  Hi, how are you?. 
2  1  I'm well, thanks. 
3  2 Is it going to rain?. 
4  2 No, I don't think so. 

              new 
0           'Hi.' 
1      'Hi. Hi, how are you?.' 
2  'Hi. Hi, how are you?. I'm well, thanks.' 
3      'Is it going to rain?.' 
4 'Is it going to rain?. No, I don't think so.'

來源

2017-04-23 08:54:38 jezrael

對我而言，結果爲： ValueError：無法從重複軸重新索引 – user3591836

什麼是您的熊貓版本？ 'print（pd.show_versions（））'。因爲我無法模擬你的錯誤。我測試重複值的值，重複索引和所有完美的版本'0.19.2'。 – jezrael

對不起，你是對的。我必須在df上重新設置reset_index（），然後才能正常工作。 – user3591836

熊貓在字符串列上滾動總和

回答

相關問題