2016-02-05 53 views
4

我有寬數據格式的數據,其中有兩組不同的值列:那些包含質量(Mass1,Mass2等)和那些包含相應日期的數據(Mass1_date ,Mass2_date等)。將寬數據收集/融化到不同的值列

library(tidyr) 
library(dplyr) 
library(lubridate) 

df <- structure(list(Year = 2004, Nest_no = 21, Mass1 = 2325, Mass1_date = structure(1081987200, class = c("POSIXct", 
"POSIXt"), tzone = "UTC"), Mass2 = 2000, Mass2_date = structure(1082851200, class = c("POSIXct", 
"POSIXt"), tzone = "UTC"), Mass3 = 1750, Mass3_date = structure(1083715200, class = c("POSIXct", 
"POSIXt"), tzone = "UTC")), class = c("tbl_df", "tbl", "data.frame" 
), row.names = c(NA, -1L), .Names = c("Year", "Nest_no", "Mass1", 
"Mass1_date", "Mass2", "Mass2_date", "Mass3", "Mass3_date")) 

df 

## Source: local data frame [1 x 8] 
## 
## Year Nest_no Mass1 Mass1_date Mass2 Mass2_date Mass3 Mass3_date 
## (dbl) (dbl) (dbl)  (time) (dbl)  (time) (dbl)  (time) 
## 1 2004  21 2325 2004-04-15 2000 2004-04-25 1750 2004-05-05 

我想「整潔」的數據爲長格式,其中兩組值列是gather ED(melt編)成兩個不同的值的列,含有的值的一列「大衆柱」,另一個是「日期欄」的值:

## Source: local data frame [3 x 5] 
## 
## Year Nest_no capture  date weight 
## (dbl) (dbl) (dbl)  (date) (dbl) 
## 1 2004  21  1 2004-04-15 2325 
## 2 2004  21  2 2004-04-25 2000 
## 3 2004  21  3 2004-05-05 1750 

起初,我以爲我可以使用tidyr並做兩步。

gather(df, capture, date, contains("Date")) %>% 
    gather(capture2, weight, contains("Mass")) 

## Source: local data frame [9 x 6] 
## 
## Year Nest_no capture  date capture2 weight 
## (dbl) (dbl)  (chr)  (time) (chr) (dbl) 
## 1 2004  21 Mass1_date 2004-04-15 Mass1 2325 
## 2 2004  21 Mass2_date 2004-04-25 Mass1 2325 
## 3 2004  21 Mass3_date 2004-05-05 Mass1 2325 
## 4 2004  21 Mass1_date 2004-04-15 Mass2 2000 
## 5 2004  21 Mass2_date 2004-04-25 Mass2 2000 
## 6 2004  21 Mass3_date 2004-05-05 Mass2 2000 
## 7 2004  21 Mass1_date 2004-04-15 Mass3 1750 
## 8 2004  21 Mass2_date 2004-04-25 Mass3 1750 
## 9 2004  21 Mass3_date 2004-05-05 Mass3 1750 

但是,它沒有按預期工作。試了幾次後,我來到了 這個解決方案:

df <- gather(df, capture2, weight, contains("Mass"), convert = T) %>% 
    mutate(capture = extract_numeric(capture2)) 

## Warning: attributes are not identical across measure variables; they will 
## be dropped 

df$capture2 <- ifelse(grepl("date", df$capture2), "date", "weight") 

df <- spread(df, capture2, weight) %>% 
    mutate(date = as.Date(as.POSIXct(date, origin = "1970-01-01"))) 

df 

## Source: local data frame [3 x 5] 
## 
## Year Nest_no capture  date weight 
## (dbl) (dbl) (dbl)  (date) (dbl) 
## 1 2004  21  1 2004-04-15 2325 
## 2 2004  21  2 2004-04-25 2000 
## 3 2004  21  3 2004-05-05 1750 

我在想,如果有一個更好地達致這方式?

謝謝你,菲利普

回答

4

我們可以從data.tablemelt做到這一點很容易。 measure可以採用多個列名稱patterns並將「寬」轉換爲「長」格式。

library(data.table) 
melt(as.data.table(df), measure=patterns('\\d$', 'date$'), 
     variable.name='capture', value.name= c('weight', 'date')) 
# Year Nest_no capture weight  date 
#1: 2004  21  1 2325 2004-04-15 
#2: 2004  21  2 2000 2004-04-25 
#3: 2004  21  3 1750 2004-05-05 
+1

謝謝你,你的答案是完美的。 –