2016-03-30 30 views
0

我有一個CSV文件,這個頭text|business_idGROUPBY並加入文本列

我想組與一個業務

我使用的所有文字review_data=review_data.groupby(['business_id'])['text'].apply("".join)

review_data是這樣的:

            text \ 
0  mr hoagi institut walk doe seem like throwback... 
1  excel food superb custom servic miss mario mac... 
2  yes place littl date open weekend staff alway ... 

     business_id 
0  5UmKMjUEUNdYWqANhGckJw 
1  5UmKMjUEUNdYWqANhGckJw 
2  5UmKMjUEUNdYWqANhGckJw 

我收到此錯誤:TypeError: sequence item 131: expected string, float found

這些都是線130至132:

130 use order fair often past 2 year food get progress wors everi time order doesnt help owner alway regist rude everi time final decid im done dont think feel let inconveni order food restaur let alon one food isnt even good also insid dirti heck deliv food bmw cant buy scrub brush found golden dragon collier squar 100 time better|SQ0j7bgSTazkVQlF5AnqyQ 
131 popular denni|wqu7ILomIOPSduRwoWp4AQ 
132 want smth quick late night would say denni|wqu7ILomIOPSduRwoWp4AQ 
+0

是否'review_data = review_data.groupby([ 'business_id'])[ 'text']。apply(「」。join「)工作?它看起來像你連接索引號 – EdChum

+0

是的,這就是想要的。但在閱讀某些行時我仍然遇到錯誤:TypeError:序列項131:預期的字符串,發現的float – severine

+0

這意味着您缺少數據,您必須發佈可重現此錯誤和代碼的示例數據 – EdChum

回答

0

我想你需要groupby前過濾notnull數據與boolean indexing

print review_data 
      text    business_id 
0 mr hoagi 5UmKMjUEUNdYWqANhGckJw 
1 excel food 5UmKMjUEUNdYWqANhGckJw 
2   NaN 5UmKMjUEUNdYWqANhGckJw 
3 yes place 5UmKMjUEUNdYWqANhGckJw 


review_data = review_data[review_data['text'].notnull()] 
print review_data 
      text    business_id 
0 mr hoagi 5UmKMjUEUNdYWqANhGckJw 
1 excel food 5UmKMjUEUNdYWqANhGckJw 
3 yes place 5UmKMjUEUNdYWqANhGckJw 

review_data=review_data.groupby(['business_id'])['text'].apply("".join) 
print review_data 
business_id 
5UmKMjUEUNdYWqANhGckJw mr hoagi excel food yes place 
Name: text, dtype: object