提取滿足R中兩個條件的字符向量的句子

假設我們有一個全文本文件作爲字符向量加載到R中。我正在尋找能夠在兩個「。」之間抽出所有文本的代碼，這兩個時期之間存在「和」以及至少一個「％」。提取滿足R中兩個條件的字符向量的句子

character <- as.character("Walmart stocks remained the same. Sony reported an increase, and the percent was posted at 1.0%. And the google also remained the same. And the percent of increase for Best Buy was 2.5%.")

考慮看看這個簡單的例子，我沿着線的

[1] Sony reported an increase, and the percent was posted at 1.0%. 
[2] And the percent of increase for Best Buy was 2.5%.

來源

2017-08-09 Kevin Ocampo

希望輸出某處一個快速的解決方案：

library(magrittr) 
"Walmart stocks remained the same. Sony reported an increase, and the percent was posted at 1.0%. And the google also remained the same. And the percent of increase for Best Buy was 2.5%." %>% 
    ## split the string at the sentence boundaries 
    gsub("\\.\\s", "\\.\t", .) %>% 
    strsplit("\\t") %>% unlist() %>% 
    ## keep only sentences that contain "and the" (irrespective of case) 
    grep("and the", x = ., value = TRUE, ignore.case = TRUE) %>% 
    ## keep only the sentences that end with %. 
    grep("%\\.$", x = ., value = TRUE) %>% 
    ## remove leading white spaces 
    gsub("^\\s?", "", x = .)

來源

2017-08-09 17:04:20 sinQueso

工作就像一個魅力！只有在我的應用程序中使用來自Web的大型文本文件時纔會出現問題，因爲這些文件太長，句子會被截斷並繼續下一行。因此，我通過在我的readLines函數前面插入粘貼，將整個文本文件轉換爲單個字符矢量，如下所示：'paste（readLines（「websiteurl.txt」），collapse =「」）％>％' –

提取滿足R中兩個條件的字符向量的句子

回答

相關問題