句子檢測和提取到相同的數據幀

我有以下的數據幀：句子檢測和提取到相同的數據幀

reviews <- data.frame(value = c("Product was received in excellent condition. Made with high quality materials. Very Good product", 
           "Inexpensive. An improvement over integrated graphics.", 
           "I love that product so excite. I will order again if I need more .", 
           "Excellent card, great graphics."), 
         user = c(1,2,3,4), 
         Review_Id = c("101968","101968","210546","112546"), 
         stringsAsFactors = FALSE)

和我需要有期望的輸出：

 user  review_Id         sentence 
      1  101968  Made with high quality materials. 
      1  101968      Very Good product 
      2  101968        Inexpensive. 
      2  101968 An improvement over integrated graphics. 
      3  210546   I love that product so excite. 
      3  210546  I will order again if I need more . 
      4  112546   Excellent card, great graphics.

我想知道是這樣的：sent_detect(reviews$value)

但是，我怎麼能結合這個功能來獲得所需的輸出。

來源

2015-03-03 martinkabe

您的數據真的很乾淨嗎？（例如，所有句子的句號都是句號，後面跟一個空格？） – A5C1D2H2I1M1N2O1R2T1 2015-03-03 11:19:53

如果不是，可以嘗試使用[this]（http://www.inside-r.org/packages/cran/openNLP/docs/Maxent_Sent_Token_Annotator），最後有一個例子 – NicE 2015-03-03 11:25:58

如果你的數據真的很整潔，你可以使用我的「splitstackshape」包中的cSplit。

library(splitstackshape) 
cSplit(reviews, "value", ".", direction = "long") 
#           value user Review_Id 
# 1: Product was received in excellent condition 1 101968 
# 2:   Made with high quality materials 1 101968 
# 3:       Very Good product 1 101968 
# 4:         Inexpensive 2 101968 
# 5:  An improvement over integrated graphics 2 101968 
# 6:    I love that product so excite 3 210546 
# 7:   I will order again if I need more 3 210546 
# 8:    Excellent card, great graphics 4 112546

來源

2015-03-03 11:21:27 A5C1D2H2I1M1N2O1R2T1

非常感謝...這個功能真的很棒。它解決了我的任務。再次感謝。 – martinkabe 2015-03-03 12:51:44

還有最後一個問題......如果我不只是結束了句子。但例如！或？，所以我怎樣才能將它添加到sSplit函數？ – martinkabe 2015-03-03 12:53:25

@martinkabe，你可以嘗試類似'cSplit（評論，「價值」，「[。！？」，固定=假，stripWhite = FALSE，方向=「長」）'分裂「。「和」？「。 – A5C1D2H2I1M1N2O1R2T1 2015-03-03 17:00:15

句子檢測和提取到相同的數據幀

回答

相關問題