如何使用R中的其他數據幀的列查詢數據幀

我在R中有2個數據幀，並且我想使用dataframe「y」參數對數據幀「x」執行查詢。如何使用R中的其他數據幀的列查詢數據幀

我有這樣的代碼：

x <- c('The book is on the table','I hear birds outside','The electricity 
came back') 
x <- data.frame(x) 
colnames(x) <- c('text') 
x 

y <- c('book','birds','electricity') 
y <- data.frame(y) 
colnames(y) <- c('search') 
y 

r <- sqldf("select * from x where text IN (select search from y)") 
r

我覺得用「喜歡」在這裏，但我不知道。你能幫忙嗎？

來源

2017-08-11 Robson Brandão

如果你想要一個sqldf的解決方案，我認爲這會工作：

sqldf("select x.text, y.search FROM x JOIN y on x.text LIKE '%' || y.search || '%'") 

##       text  search 
## 1 The book is on the table  book 
## 2  I hear birds outside  birds 
## 3 The electricity \ncame back electricity

來源

2017-08-11 16:21:44

感謝。如果我需要查詢使用「y」中的每個單詞過濾「x」？ –

這不就是代碼的作用嗎？如果修改'y'只包含一個或兩個項（例如：'y < - c（'book'，'birds''），您將會看到只顯示了'x'的前兩行。那你想要什麼？ –

謝謝，它非常完美。 –

您可以使用fuzzyjoin包：

library(dplyr) 
library(fuzzyjoin) 

regex_join(
    mutate_if(x, is.factor, as.character), 
    mutate_if(y, is.factor, as.character), 
    by = c("text" = "search") 
) 

#       text  search 
# 1 The book is on the table  book 
# 2  I hear birds outside  birds 
# 3 The electricity \ncame back electricity

來源

2017-08-11 15:39:40

很難知道這是你想要的東西沒有了更多樣化的燈具。爲了增加一點變化，我在y$search - y = c('book','birds','electricity', 'cat')中增加了一個額外的單詞。更多的變化會進一步澄清

只知道哪些單詞在哪些語句？ sapply和grepl

> m = sapply(y$search, grepl, x$text) 
> rownames(m) = x$text 
> colnames(m) = y$search 
> m 
          book birds electricity cat 
The book is on the table  TRUE FALSE  FALSE FALSE 
I hear birds outside  FALSE TRUE  FALSE FALSE 
The electricity \ncame back FALSE FALSE  TRUE FALSE

拔出只是匹配的行？

> library(magrittr) # To use the pipe, "%>%" 
> x %>% data.table::setDT() # To return the result as a table easily 
> 
> x[(sapply(y$search, grepl, x$text) %>% rowSums() %>% as.logical()) * (1:nrow(x)), ] 
          text 
1: The book is on the table 
2:  I hear birds outside 
3: The electricity \ncame back

@Aurèle的解決方案將爲匹配文本和匹配的文本提供最佳結果。請注意，如果back也在y$search中，則文本The electricity \ncame back會在匹配的不同搜索字詞的結果中得到兩次報告，所以在唯一性並不重要的情況下，這樣會更好。

所以它很大程度上取決於你想要的輸出。

來源

2017-08-11 15:55:09

如何使用R中的其他數據幀的列查詢數據幀

回答

相關問題