按照

2016-12-07 30 views
3

的通用組合並數據幀我有兩個由不同採樣器採集的龍蝦卵尺寸數據的數據集,這些數據集將用於評估測量變異性。每個採樣器測量來自多個龍蝦的〜50個雞蛋和龍蝦。然而,偶爾有一些龍蝦由採樣器1處理,而不是採樣器2處理,反之亦然。我想將來自兩個採樣器的數據合併爲一個新的數據集,但要刪除所有僅由一個採樣器處理的龍蝦數據。我用semi_join和dplyr玩過相交,但我需要在數據集1 - > 2和2 < -1之間執行匹配。我能夠創建一個新的數據集,該數據集綁定來自兩個採樣器的行,但不清楚如何刪除新數據集中兩個數據集之間的所有唯一龍蝦ID。按照

這裏是我的數據的簡化版本,其中從多個龍蝦取得多個雞蛋麪積測量結果,但採樣並不總是重疊(即,雞蛋僅由一個採樣器而不是從另一個採樣器測量):

install.packages(dplyr) 
library(dplyr) 

sampler1 <- data.frame(LobsterID=c("Lobster1","Lobster1","Lobster2", 
            "Lobster2","Lobster2","Lobster2", 
            "Lobster2","Lobster3","Lobster3","Lobster3"), 
         Area=c(.4,.35,1.1,1.04,1.14,1.1,1.05,1.7,1.63,1.8), 
         Sampler=c(rep("Sampler1", 10))) 
sampler2 <- data.frame(LobsterID=c("Lobster1","Lobster1","Lobster1", 
            "Lobster1","Lobster1","Lobster2", 
            "Lobster2","Lobster2","Lobster4","Lobster4"), 
         Area=c(.41,.44,.47,.43,.38,1.14,1.11,1.09,1.41,1.4), 
         Sampler=c(rep("Sampler2", 10))) 

combined <- bind_rows(sampler1, sampler2) 

desiredresult <- combined[-c(8, 9, 10, 19, 20), ] 

該腳本的底線是模擬數據所需的結果。我曾希望限制使用R或dplyr。

回答

6
sampler1 %>% rbind(sampler2) %>% filter(LobsterID %in% intersect(sampler1$LobsterID, sampler2$LobsterID)) 
+0

幹得子集的行!謝謝! – user24537

2
combined <- bind_rows(sampler1, sampler2) 


Lobsters.2.sample <- as.character(unique(sampler1$LobsterID)[unique(sampler1$LobsterID) %in% unique(sampler2$LobsterID)]) 

combined <- combined[combined$LobsterID %in% Lobsters.2.sample,] 
1

綁定的行中,基團,並且通過不同的採樣的每個組中的數目的濾波器:

sampler1 %>% bind_rows(sampler2) %>% 
    group_by(LobsterID) %>% 
    filter(n_distinct(Sampler) == 2) 

## Source: local data frame [15 x 3] 
## Groups: LobsterID [2] 
## 
## LobsterID Area Sampler 
##  <chr> <dbl> <chr> 
## 1 Lobster1 0.40 Sampler1 
## 2 Lobster1 0.35 Sampler1 
## 3 Lobster2 1.10 Sampler1 
## 4 Lobster2 1.04 Sampler1 
## 5 Lobster2 1.14 Sampler1 
## 6 Lobster2 1.10 Sampler1 
## 7 Lobster2 1.05 Sampler1 
## 8 Lobster1 0.41 Sampler2 
## 9 Lobster1 0.44 Sampler2 
## 10 Lobster1 0.47 Sampler2 
## 11 Lobster1 0.43 Sampler2 
## 12 Lobster1 0.38 Sampler2 
## 13 Lobster2 1.14 Sampler2 
## 14 Lobster2 1.11 Sampler2 
## 15 Lobster2 1.09 Sampler2 
2

使用鹼R

combined <-rbind(sampler1, sampler2) 
inBoth <- intersect(sampler1[["LobsterID"]], sampler2[["LobsterID"]]) 
output <- combined[combined[["LobsterID"]] %in% inBoth, ] 

intersect發現並集的兩個載體,給你兩個樣本的龍蝦。所有功能都是矢量化的,所以它應該運行得非常快。

1

這是一個使用data.table的選項。由「LobsterID」使用rbindlist綁定數據集,組以及使用基於在「取樣」獨特的元素即相等的數量的邏輯條件來2.

library(data.table) 
rbindlist(list(sampler1, sampler2))[, if(uniqueN(Sampler)==2) .SD , by = LobsterID]