如何將段落拆分爲R語言中的行

-1

我是R的初學者，學習基礎知識嘗試檢索包含某些特定詞語的句子我使用readLines（）讀取文件數據並使用grep嘗試檢索某些特定語句但檢索的數據是包含特定單詞完整的段落如何將段落拆分爲R語言中的行

x<- readLines(filepath) 
grep("processor",x,value=TRUE,ignore.case=TRUE)

如果我的話是包含處理器「處理器」，然後完整段落檢索

輸出：第5代Corei3過程或者8GB內存，2GB圖形處理器，1TB硬盤，15.6英寸720p高清防反射顯示屏，這款筆記本電腦是該領域的高端產品。來自惠普這樣的品牌，可以爲您提供在開展業務時可能需要的狀態價值和企業服務。

但我只想要單句即第5代Corei3處理器，8GB內存，2GB圖形處理器，1TB硬盤，15.6英寸720p高清抗反射顯示屏，這款筆記本電腦是該領域的高端產品。

如何將段落拆分成行。這樣我就只能得到包含特定單詞的句子，並且grep很好用或不可用

來源

2017-12-27 sampurna

我們沒有文字可以幫助您。這不是一個最小的，工作的，可重複的例子，並且可能會被關閉。 – hrbrmstr

添加到@hrbrmstr評論 - 請閱讀[如何創建最小，完整和可驗證示例]（https://stackoverflow.com/help/mcve）並更新您的帖子。 –

quanteda包可用於將文本輸入標記爲句子。一旦文檔被分成句子，grep()可用於將包含單詞處理器的句子提取到矢量中。我們將使用原始文本文件，將其解釋爲quanteda中的2個文檔，並提取包含單詞處理器的句子。

rawText <- "A 5th Gen Core i3 processor, 8GB RAM, 2GB graphics processor, 1TB HDD, 15.6-inch 720p HD antireflective display, this laptop is a premium offering in this segment. Coming from a brand like HP this offers you the status value and corporate services that you might need while conducting business. 
Intel® Celeron® processor N3160. Entry-level quad-core processor for general e-mail, Internet and productivity tasks. 4GB system memory for basic multitasking: Adequate high-bandwidth RAM to smoothly run multiple applications and browser tabs all at once." 

library(quanteda) 
sentences <- tokens(rawText,"sentence") 
unlist(lapply(sentences,function(x){ 
    grep("processor",x,value=TRUE) 
}))

...和輸出：

> unlist(lapply(sentences,function(x){ 
+  grep("processor",x,value=TRUE) 
+ })) 


text11 

"A 5th Gen Core i3 processor, 8GB RAM, 2GB graphics processor, 1TB HDD, 15.6-inch 720p HD antireflective display, this laptop is a premium offering in this segment." 


text12 


"Intel® Celeron® processor N3160." 


text13 


"Entry-level quad-core processor for general e-mail, Internet and productivity tasks." 
>

另一種方法是使用stringi::str_detect_fixed()找到字符串。

# stringi::stri_detect_fixed() approach 
library(stringi) 
unlist(lapply(sentences,function(x){ 
     x[stri_detect_fixed(x,"processor")] 
}))

來源

2017-12-27 20:08:55

是的，謝謝你的幫助 – sampurna

如何將段落拆分爲R語言中的行

回答

相關問題