如何查找包含給定單詞列表中的單詞的行？不僅是某個詞，在某些列表中的任何字計數

我有話給定的列表，例如：如何查找包含給定單詞列表中的單詞的行？不僅是某個詞，在某些列表中的任何字計數

words <- c("breast","cancer","chemotherapy")

而且我有一個非常大的數據幀，1個變量和超過10,000個條目（行）。

我想選擇所有包含在「詞」的任何單詞的行。不僅某個單詞，「單詞」中的任何單詞都是重要的。包含來自「單詞」的多個單詞也很重要。

如果我知道這個「字」是什麼，我可以做stringr提取多次。然而，這些「詞」每次都會改變，而且看不到。有沒有直接的方法來做到這一點？

另外，是否有可能選擇包含2個或更多單詞的所有行在「單詞」中？例如。只包含「癌症」並不算數，但包含「乳房」和「癌症」數量。再次，這些「單詞」每次都會改變，而且無法看到。任何直接的方式？

來源

2016-11-26 user7107092

一些假的數據：

words <- c("breast","cancer","chemotherapy") 
df <- data.frame(v1 = c("there was nothing found","the chemotherapy is effective","no cancer no chemotherapy","the breast looked normal","something"))

你可以使用的grepl組合，sapply和rowSums：

df[rowSums(sapply(words, grepl, df$v1)) > 0, , drop = FALSE]

這導致：

       v1 
2 the chemotherapy is effective 
3  no cancer no chemotherapy 
4  the breast looked normal

如果只想SELCT具有至少兩個詞的行，則：

df[rowSums(sapply(words, grepl, df$v1)) > 1, , drop = FALSE]

結果：

       v1 
3  no cancer no chemotherapy

注意：您需要使用drop = FALSE因爲你的數據框有一個變量（列）。如果你的數據幀有多個變量（列），那麼不需要使用drop = FALSE。

來源

2016-11-26 08:03:26 h3rm4n

如何查找包含給定單詞列表中的單詞的行？不僅是某個詞，在某些列表中的任何字計數

回答

相關問題