R：搜索另一個字符串中的一個字符串中的字，並返回不匹配的字

我的數據表有兩個文本列（col1和col2）。他們都有句子。我想在col2中查找col1中的所有單詞，並返回一個字符串，其中包含col1中的單詞，其中包含在col2中找到的單詞。下面是一個例子R：搜索另一個字符串中的一個字符串中的字，並返回不匹配的字

  col1     |   col2    |  output 
america, uk have too much money | uk, uk money too too | america, have much

來源

2017-05-25 Oshan

你嘗試過這麼遠嗎？ – Jan

這樣的事情？

DT <- data.table(col1 <- "america, uk have too much money", col2 <- "uk, uk money too too") 
DT[, output := paste(strsplit(DT[,col1], "(\\s+)|(?!')(?=[[:punct:]])", perl = TRUE)[[1]][!(strsplit(DT[,col1],"(\\s+)|(?!')(?=[[:punct:]])", perl = TRUE)[[1]] %in% strsplit(DT[,col2], "(\\s+)|(?!')(?=[[:punct:]])", perl = TRUE)[[1]])], collapse = " ")]

沒有逗號雖然

來源

2017-05-25 12:51:10 simone

[看這裏]（https://stackoverflow.com/questions/22235288/strsplit-on-all-spaces-and-punctuation-except-apostrophes） – simone

謝謝@simone .. – Oshan

R：搜索另一個字符串中的一個字符串中的字，並返回不匹配的字

回答

相關問題