2017-03-28 99 views
0

我正在處理多個大型數據框,並需要將數據分類到船和網的第一個和最後一個入口。我的數據幀看起來像這樣:Group By Then Aggregate

Boat  Net  DateTime 
Dawn  71  2014-07-10 10:10 
Dawn  71  2014-07-15 11:10 
Whip  71  2014-07-17 08:10 
Whip  71  2014-07-29 12:36 
Dawn  71  2014-08-24 14:53 
Whip  71  2014-09-02 11:17 
Whip  73  2014-09-14 16:24 
Whip  71  2014-09-15 18:16 
Whip  73  2014-09-17 20:25 

我需要的數據幀只包含船隻的每個網絡的第一個和最後一個條目。數據應該是這樣的:

Boat  Net  DateTime 
Dawn  71  2014-07-10 10:10 
Whip  71  2014-07-17 08:10 
Dawn  71  2014-08-24 14:53 
Whip  73  2014-09-14 16:24 
Whip  71  2014-09-15 18:16 
Whip  73  2014-09-17 20:25 

我嘗試了幾個不同的東西,並得到接近,但不是那裏。

Head <- aggregate(df, by = list(df$Net), FUN = head, n = 1) 
Tail <- aggregate(df, by = list(df$Net), FUN = tail, n = 1) 
Final <- rbind(Head, Tail) 

這個工作很好,但並沒有考慮到不同的船相同網絡號,然後我試着組乘船,但得到了同樣的結果:

Head <- df %>% group_by(Boat) %>% aggregate(df, by = list(df$Net), FUN = head, n = 1) %>% ungroup 

這兩個函數返回的以下數據:(只淨數的第一和最後一個條目)

Boat  Net  DateTime 
Dawn  71  2014-07-10 10:10 
Whip  73  2014-09-14 16:24 
Whip  71  2014-09-15 18:16 
Whip  73  2014-09-17 20:25 

我想我接近,但不能完全到達那裏,任何幫助,將不勝感激。

回答

3

對於聚集的方法,你可以得到你想要的東西通過提供df$Boatdf$Netaggregate

Head <- aggregate(df, by = list(df$Boat, df$Net), FUN = head, n = 1) 
Tail <- aggregate(df, by = list(df$Boat, df$Net), FUN = tail, n = 1) 
Final <- rbind(Head, Tail) 

既然你嘗試使用dplyr的group_by,這裏有一個dplyr替代方案,它通過組使用slice

Final <- df %>% 
    group_by(Boat, Net) %>% 
    slice(c(1, n())) %>% 
    ungroup() 

(注意:group_byaggregate沒有做什麼特別的組合 - group_by WOR ks只與其他dplyr功能,如slice,summarizemutate)。

+0

這奏效了一個選擇!感謝您的幫助! –

1
do.call(rbind, lapply(split(df, paste(df$Boat, df$Net, sep = "-")), 
      function(a) a[c(1, NROW(a)),])) 
#   Boat Net   DateTime 
#Dawn-71.1 Dawn 71 2014-07-10 10:10 
#Dawn-71.5 Dawn 71 2014-08-24 14:53 
#Whip-71.3 Whip 71 2014-07-17 08:10 
#Whip-71.8 Whip 71 2014-09-15 18:16 
#Whip-73.7 Whip 73 2014-09-14 16:24 
#Whip-73.9 Whip 73 2014-09-17 20:25 

DATA

df = structure(list(Boat = c("Dawn", "Dawn", "Whip", "Whip", "Dawn", 
"Whip", "Whip", "Whip", "Whip"), Net = c(71L, 71L, 71L, 71L, 
71L, 71L, 73L, 71L, 73L), DateTime = c("2014-07-10 10:10", "2014-07-15 11:10", 
"2014-07-17 08:10", "2014-07-29 12:36", "2014-08-24 14:53", "2014-09-02 11:17", 
"2014-09-14 16:24", "2014-09-15 18:16", "2014-09-17 20:25")), .Names = c("Boat", 
"Net", "DateTime"), class = "data.frame", row.names = c(NA, -9L 
)) 
0

這裏是data.table

library(data.table) 
setDT(df)[, .SD[c(1, .N)], .(Boat, Net)] 
# Boat Net   DateTime 
#1: Dawn 71 2014-07-10 10:10 
#2: Dawn 71 2014-08-24 14:53 
#3: Whip 71 2014-07-17 08:10 
#4: Whip 71 2014-09-15 18:16 
#5: Whip 73 2014-09-14 16:24 
#6: Whip 73 2014-09-17 20:25