quanteda
包可用於將文本輸入標記爲句子。一旦文檔被分成句子,grep()
可用於將包含單詞處理器的句子提取到矢量中。我們將使用原始文本文件,將其解釋爲quanteda中的2個文檔,並提取包含單詞處理器的句子。
rawText <- "A 5th Gen Core i3 processor, 8GB RAM, 2GB graphics processor, 1TB HDD, 15.6-inch 720p HD antireflective display, this laptop is a premium offering in this segment. Coming from a brand like HP this offers you the status value and corporate services that you might need while conducting business.
Intel® Celeron® processor N3160. Entry-level quad-core processor for general e-mail, Internet and productivity tasks. 4GB system memory for basic multitasking: Adequate high-bandwidth RAM to smoothly run multiple applications and browser tabs all at once."
library(quanteda)
sentences <- tokens(rawText,"sentence")
unlist(lapply(sentences,function(x){
grep("processor",x,value=TRUE)
}))
...和輸出:
> unlist(lapply(sentences,function(x){
+ grep("processor",x,value=TRUE)
+ }))
text11
"A 5th Gen Core i3 processor, 8GB RAM, 2GB graphics processor, 1TB HDD, 15.6-inch 720p HD antireflective display, this laptop is a premium offering in this segment."
text12
"Intel® Celeron® processor N3160."
text13
"Entry-level quad-core processor for general e-mail, Internet and productivity tasks."
>
另一種方法是使用stringi::str_detect_fixed()
找到字符串。
# stringi::stri_detect_fixed() approach
library(stringi)
unlist(lapply(sentences,function(x){
x[stri_detect_fixed(x,"processor")]
}))
我們沒有文字可以幫助您。這不是一個最小的,工作的,可重複的例子,並且可能會被關閉。 – hrbrmstr
添加到@hrbrmstr評論 - 請閱讀[如何創建最小,完整和可驗證示例](https://stackoverflow.com/help/mcve)並更新您的帖子。 –