我有以下的數據幀:句子檢測和提取到相同的數據幀
reviews <- data.frame(value = c("Product was received in excellent condition. Made with high quality materials. Very Good product",
"Inexpensive. An improvement over integrated graphics.",
"I love that product so excite. I will order again if I need more .",
"Excellent card, great graphics."),
user = c(1,2,3,4),
Review_Id = c("101968","101968","210546","112546"),
stringsAsFactors = FALSE)
和我需要有期望的輸出:
user review_Id sentence
1 101968 Made with high quality materials.
1 101968 Very Good product
2 101968 Inexpensive.
2 101968 An improvement over integrated graphics.
3 210546 I love that product so excite.
3 210546 I will order again if I need more .
4 112546 Excellent card, great graphics.
我想知道是這樣的:sent_detect(reviews$value)
但是,我怎麼能結合這個功能來獲得所需的輸出。
您的數據真的很乾淨嗎? (例如,所有句子的句號都是句號,後面跟一個空格?) – A5C1D2H2I1M1N2O1R2T1 2015-03-03 11:19:53
如果不是,可以嘗試使用[this](http://www.inside-r.org/packages/cran/openNLP/docs/Maxent_Sent_Token_Annotator),最後有一個例子 – NicE 2015-03-03 11:25:58