我有兩個數據幀, DF1:匹配排名最高的字在數據幀R列文本
df1 <- c("A large bunch of purple grapes", "large green potato sack", "small red tomatoes", "yellow and black bananas")
df1 <- data.frame(df1)
DF2:
Word <- c("green", "purple", "grapes", "small", "sack", "yellow", "bananas", "large)
Rank <- c(20,18,22,16,15,17,6,12)
df2 <- data.frame(Word,Rank)
DF1:
ID Sentence
1 A large bunch of purple grapes
2 large green potato sack
3 small red tomatoes
4 yellow and black bananas
DF2:
ID Word Rank
1 green 20
2 purple 18
3 grapes 22
4 small 16
5 Sack 15
6 yellow 17
7 bananas 6
8 large 12
我想要做的是;將df2中的單詞與「Sentence」列中包含的單詞相匹配,並在df1中插入一個包含df2中排名最高的匹配單詞的新列。因此,像這樣:
DF1:
ID Sentence Word
1 A large bunch of purple grapes grapes
2 large green potato sack green
3 small red tomatoes small
4 yellow and black bananas yellow
我最初用於下面的代碼相匹配的話,當然這會創建一個包含所有匹配的單詞列:
x <- sapply(df2$Word, function(x) grepl(tolower(x), tolower(df1$Sentence)))
df1$top_match <- apply(x, 1, function(i) paste0(names(i)[i], collapse = " "))
如果一個句子沒有匹配'df2'的是,做你想做的只是返回'NA'任何文字?在這種情況下,所有的句子都有匹配,但我只是想確保你沒有尋找更一般的東西。 – useR
是的,返回N/A很好,謝謝! – Jammin
另外,你能否提供你的數據爲'deput(df1)'deput(df2)'或者你用來生成它們的代碼? – useR