2016-11-08 34 views
1

我有以下段落:打破一個段落成句子的向量中的R

嗯,嗯......這樣的個人話題。難怪我是第一個撰寫評論的人。只需說這些東西就是他們聲稱的東西,而且味道愉快。我在這個領域遇到了重大問題,現在我沒有。 「Nuff說。 :-)

RSentiment封裝應用calculate_total_presence_sentiment命令的目的,我想打破這一段成句子的向量如下:

[1] "Well, um...such a personal topic."          
[2] "No wonder I am the first to write a review."        
[3] "Suffice to say this stuff does just what they claim and tastes pleasant." 
[4] "And I had, well, major problems in this area and now I don't."   
[5] "'Nuff said."                
[6] ":-)" 

非常感謝您對這個幫助。

回答

1

qdap有一個非常方便的功能:

sent_detect_nlp - 檢測和分割句子的終止標記邊界 使用openNLP & NLP公用事業其中 openNLP包的onld版本匹配現在刪除sentDetect功能。

library(qdap) 

txt <- "Well, um...such a personal topic. No wonder I am the first to write a review. Suffice to say this stuff does just what they claim and tastes pleasant. And I had, well, major problems in this area and now I don't. 'Nuff said. :-)" 

sent_detect_nlp(txt) 
#[1] "Well, um...such a personal topic."          
#[2] "No wonder I am the first to write a review."        
#[3] "Suffice to say this stuff does just what they claim and tastes pleasant." 
#[4] "And I had, well, major problems in this area and now I don't."   
#[5] "'Nuff said."                
#[6] ":-)" 
0

骯髒的解決方案

> data <- "Well, um...such a personal topic. No wonder I am the first to write a review. Suffice to say this stuff does just what they claim and tastes pleasant. And I had, well, major problems in this area and now I don't. 'Nuff said. :-)" 
    > ?"regular expression" 
    > strsplit(data, "(?<=[^.][.][^.])", perl=TRUE) 
    [[1]] 
    [1] "Well, um...such a personal topic. "          
    [2] "No wonder I am the first to write a review. "        
    [3] "Suffice to say this stuff does just what they claim and tastes pleasant. " 
    [4] "And I had, well, major problems in this area and now I don't. "   
    [5] "'Nuff said. "                
    [6] ":-)"                  

使用來自https://cran.r-project.org/web/views/NaturalLanguageProcessing.html

-1

工具可以保存一個txt文件文本。確保.txt文件中的每一行都包含一個要作爲矢量讀取的語句。 使用基本功能readLines('filepath/filename.txt')。 生成的數據框將讀取每行在原始文本文件中作爲矢量。

> mylines <- readLines('text.txt') 
Warning message: 
In readLines("text.txt") : incomplete final line found on 'text.txt' 
> mylines 
[1] "Well, um...such a personal topic."          
[2] "No wonder I am the first to write a review."        
[3] "Suffice to say this stuff does just what they claim and tastes 
pleasant." 
[4] "And I had, well, major problems in this area and now I don't."   
[5] "'Nuff said'."                
[6] ":-)" 

> mylines[3] 
[1] "Suffice to say this stuff does just what they claim and tastes 
pleasant."