2016-12-14 104 views
0

每個唯一標識符(分組元素)創建這些列我有以下數據集:找到最小值和最大值以及R中

MC <- c(rep("OS000348",8), rep("OS000361",13), rep("OS000375",5)) 
ASN <- c(rep(2,8), rep(3,5), rep(2,8), rep(3,5)) 
Dia <- c(870,"NA", 867.3, "NA", "NA", 890.3,"NA","NA",871.2,"NA",868.7,"NA",866.2, "NA", 
"NA",851,"NA","NA",842,"NA","NA",880,860,851.8,"NA",841) 

df <- data.frame(MC,ASN,Dia) 

df 

我想找到每個MC,最小和最大直徑值和設置在所得的表如下所示:

MC   Dia  Min_Dia Max_Dia 
OS000348 870  867.3 890.3 
OS000361 871.2 841  871.2 
OS000375 880  841  880 

我試圖使用dplyr包和以下:

result1 <- 
    df %>% 
    group_by(MC) %>% 
    arrange(MC) %>% 
    slice(c(1, n())) %>% 
    mutate(minmax = c("Min", "Max")) %>% 
    gather(var, val, Dia) %>% 
    unite(key, minmax, var) %>% 
    spread(key, val) 

但我沒有得到表,我想要的方式(上表第二張表)。

可以選擇嗎?

+0

不要輸入爲'「NA」',輸入爲'NA'代替。聚合函數可以很好地工作:聚合(Dia〜MC,data = df,FUN = function(x)c(head(x,1),min(x,na.rm = T),max(x, na.rm = T)))' – bouncyball

回答

3

首先,您需要輸入NA作爲NA而不是"NA",否則R將其讀作字符向量,並且您不能使用min()函數。這段代碼產生所需的輸出:

MC <- c(rep("OS000348",8), rep("OS000361",13), rep("OS000375",5)) 
ASN <- c(rep(2,8), rep(3,5), rep(2,8), rep(3,5)) 
Dia <- c(870,NA, 867.3, NA, NA, 890.3,NA,NA,871.2,NA,868.7,NA,866.2, NA, 
     NA,851,NA,NA,842,NA,NA,880,860,851.8,NA,841) 

df <- data.frame(MC,ASN,Dia) 

library(dplyr) 

df <- df %>% 
    group_by(MC) %>% 
    mutate(minDia=min(Dia, na.rm=T), maxDia=max(Dia, na.rm=T)) 

如果使用這個你只是想保持MC的一個觀察:

df2 <- df %>% 
    group_by(MC) %>% 
    mutate(minDia=min(Dia, na.rm=T), maxDia=max(Dia, na.rm=T)) %>% 
    ungroup() %>% 
    distinct(MC, minDia, maxDia) 
+0

感謝這麼快的迴應。它顯示一個錯誤:'min'對因素無意義 – ZeekDSA

+1

如果您在第一句中遵循@ yoland的建議,則不會出現此錯誤消息。 – bouncyball

+0

@bouncyball哈哈哈。謝謝我只是注意到它...... :) – ZeekDSA