獲取另一個數據框中特定值的計數

這個問題可能聽起來與其他問題類似，但我希望它有所不同。我想採取一個具體的值列表，並計算它們在另一個值列表中出現的頻率，其中不出現的值被重新調整爲0。獲取另一個數據框中特定值的計數

我有一個數據幀（DF1）具有以下值：包含一個名爲 '東西' 列

Items <- c('Carrots','Plums','Pineapple','Turkey') 
df1<-data.frame(Items) 

>df1 
Items 
1 Carrots 
2  Plums 
3 Pineapple 
4 Turkey

和第二數據幀（DF2）：

> head(df2,n=10) 
    ID  Date  Thing 
1 58150 2012-09-12 Potatoes 
2 12357 2012-09-28 Turnips 
3 50788 2012-10-04 Oranges 
4 66038 2012-10-11 Potatoes 
5 18119 2012-10-11 Oranges 
6 48349 2012-10-14 Carrots 
7 23328 2012-10-16 Peppers 
8 66038 2012-10-26 Pineapple 
9 32717 2012-10-28 Turnips 
10 11345 2012-11-08 Oranges

我知道「土耳其」一詞只出現在df1而不是df2中。我想返回頻率表或df1中出現在df2中的項目的計數，並返回土耳其計數的「0」。

如何使用來自另一個值的數據框列來總結值？我得到的最接近是：

df2%>% count (Thing) %>% filter(Thing %in% df1$Items,)

但這回DF1和DF2之間的過濾項的列表，以便「土耳其」被排除在外。很近！

> df2%>% count (Thing) %>% filter(Thing %in% df1$Items,) 
# A tibble: 3 x 2 
     Thing  n 
    <fctr> <int> 
1 Carrots 30 
2 Pineapple 30 
3  Plums 38

我希望我的輸出看起來像這樣：

1 Carrots 30 
2 Pineapple 30 
3  Plums 38 
4 Turkey  0

我新望到R和完全新的dplyr。

來源

2017-09-14 gzrcm

我一直都在用這種東西。我相信有一個更精明的方式來編碼，但這是我得到的：

item <- vector() 
count <- vector() 
items <- list(unique(df1$Items)) 

for (i in 1:length(items)){ 
    item[i] <- items[i] 
    count[i] <- sum(df2$Thing == item) 
} 

df3 <- data.frame(cbind(item, count))

希望這有助於！

來源

2017-09-14 14:10:26

感謝斯蒂芬，我收到了長度警告：'較長對象長度不短對象length' – gzrcm

啊，我想我知道爲什麼的倍數。所以上面的代碼查看每個項目，而不僅僅是唯一的項目。我已經更新了我的評論。 –

我仍然收到同樣的錯誤，但是我看到了你的腳本試圖達到的目標。我創建的df1來自一個向量。有沒有什麼辦法可以簡化使用原始矢量的for循環？ – gzrcm

斯蒂芬的解決方案稍作修改，在count [i]行結尾添加[i]。請看下圖：

item <- vector() 
count <- vector() 

for (i in 1:length(unique(Items))){ 
    item[i] <- Items[i] 
    count[i]<- sum(df2$Thing == item[i]) 
} 

df3 <- data.frame(cbind(item, count)) 

> df3 
     item count 
1 Carrots 30 
2  Plums 38 
3 Pineapple 30 
4 Turkey  0

來源

2017-09-14 14:46:50 gzrcm

dplyr降到0計數行，和你有更加複雜的是的Thing可能類別是你的兩個數據集之間的不同。

如果添加因子水平從df1到df2，您可以使用complete從tidyr，這是add 0 count rows的常用方法。

我使用的是從包forcats稱爲fct_expand一個方便的功能附加從df1因子水平df2。

library(dplyr) 
library(tidyr) 
library(forcats) 

df2 %>% 
    mutate(Thing = fct_expand(Thing, as.character(df1$Item))) %>% 
    count(Thing) %>% 
    complete(Thing, fill = list(n = 0)) %>% 
    filter(Thing %in% df1$Items,)

來源

2017-09-14 15:07:55 aosmith

謝謝aosmith！這也起作用。 – gzrcm

一種不同的方法是聚集df2第一，與df1右連接（挑df1所有行），並且通過零來替換NA。

library(dplyr) 
df2 %>% 
    count(Thing) %>% 
    right_join(unique(df1), by = c("Thing" = "Items")) %>% 
    mutate(n = coalesce(n, 0L))

# A tibble: 4 x 2 
     Thing  n 
     <chr> <int> 
1 Carrots  1 
2  Plums  0 
3 Pineapple  1 
4 Turkey  0 
Warning message: 
Column `Thing`/`Items` joining factors with different levels, coercing to character vector

在data.table相同的方法：

library(data.table) 
setDT(df2)[, .N, by = Thing][unique(setDT(df1)), on = .(Thing = Items)][is.na(N), N := 0L][]

 Thing N 
1: Carrots 1 
2:  Plums 0 
3: Pineapple 1 
4: Turkey 0

注意，在兩個實現unique(df1)是用來避免意外重複連接後的行。

如果df2大，df1只包含幾個Items它可能是更有效的加入，然後再彙總：

library(dplyr) 
df2 %>% 
    right_join(unique(df1), by = c("Thing" = "Items")) %>% 
    group_by(Thing) %>% 
    summarise(n = sum(!is.na(ID)))

# A tibble: 4 x 2 
     Thing  n 
     <chr> <int> 
1 Carrots  1 
2 Pineapple  1 
3  Plums  0 
4 Turkey  0 
Warning message: 
Column `Thing`/`Items` joining factors with different levels, coercing to character vector

同樣在data.table syntax：

library(data.table) 
setDT(df2)[unique(setDT(df1)), on = .(Thing = Items)][, .(N = sum(!is.na(ID))), by = Thing][]

 Thing N 
1: Carrots 1 
2:  Plums 0 
3: Pineapple 1 
4: Turkey 0

來源

2017-09-14 16:28:29 Uwe

謝謝Uwe！你的解決方案工作 – gzrcm

獲取另一個數據框中特定值的計數

回答

相關問題