2015-09-09 77 views
0

我有6個數據框,所有數據框都有唯一的列名,相同數量的列,並且數據是在同一時間段內收集的。根據匹配時間戳合併多個數據框

每個數據幀都有一個時間標記和分鐘平均值,但是一些數據幀中缺少數據和列長度不相等。

我想合併數據幀並排顯示全部6個數據幀,但只有在所有6個數據幀都存在數據的時候,即具有最低列數的df,即「H1_min」

> head(H1_min) 
      h1min h1temp h1humid h1db  h1hz 
1 2015-09-06 00:00:00 21.5 73.10 39.252 117.1900 
2 2015-09-06 00:02:00 21.5 72.50 39.434 125.0000 
3 2015-09-06 00:03:00 21.5 72.65 39.338 127.9325 
4 2015-09-06 00:04:00 21.5 73.00 39.206 148.4400 
5 2015-09-06 00:06:00 21.5 73.00 39.253 144.5350 
6 2015-09-06 00:07:00 21.5 72.30 39.293 156.2500 

其他數據框的名稱相似,但H1 = H2到H6。

dput(head(H2_min)) 

"2015-09-08 20:21:00", "2015-09-08 20:22:00", "2015-09-08 20:23:00", 
"2015-09-08 20:24:00", "2015-09-08 20:25:00", "2015-09-08 20:26:00", 
"2015-09-08 20:27:00", "2015-09-08 20:28:00", "2015-09-08 20:29:00", 
"2015-09-08 20:30:00", "2015-09-08 20:31:00", "2015-09-08 20:32:00", 
"2015-09-08 20:33:00", "2015-09-08 20:34:00", "2015-09-08 20:35:00" 
), class = "factor"), h2temp = c(23.4, 23.4, 23.3, 23.2, 23.2, 
23.1), h2humid = c(38.5, 38.3, 38.05, 38.1, 38.6, 38.6), h2db = c(38.834, 
38.655, 38.679, 38.695, 38.806, 38.702), h2hz = c(191.41, 152.34, 
162.11, 113.28, 121.09, 164.06)), .Names = c("h2min", "h2temp", 
"h2humid", "h2db", "h2hz"), row.names = c(NA, 6L), class = "data.frame") 

dput(head(H4_min)) 

"2015-09-08 17:10:00", "2015-09-08 17:11:00", "2015-09-08 17:12:00", 
"2015-09-08 17:13:00"), class = "factor"), h4temp = c(27.2, 27.2, 
27.2, 27.2, 27.2, 27.2), h4humid = c(33.5, 33.5, 33.5, 33.5, 
33.5, 33.5), h4db = c(36.8225, 36.921, 36.8766666666667, 36.91, 
36.8336666666667, 36.768), h4hz = c(134.765, 136.068333333333, 
137.373333333333, 126.3, 139.323333333333, 128.906666666667)), .Names =  
c("h4min", "h4temp", "h4humid", "h4db", "h4hz"), row.names = c(NA, 6L), class = "data.frame") 

這種嘗試得到:

H_min<-merge(H1_min, H2_min, H3_min, H4_min, H5_min, H6_min, by.x = 'row.names', by.y ='h1_min') 

Error in fix.by(by.y, y) : 'by' must specify a uniquely valid column 
+2

帶空格的數據很難輸入。請提供'dput(head(H1_min))'的輸出。這種額外數據框的輸出也會有所幫助。 –

+0

當然,將它添加到第二個數據框中 – Evan

+1

@Evan不是輸出輸出...它應該以'structure('... – pcantalupo

回答

1

另一種方式做,這是對data.frames轉換爲XTS對象,然後用merge.xts(...),它會自動並軌基於時間戳,以及然後將結果轉換回data.frame。

以下大部分代碼僅用於創建可重現的樣本數據。實際工作最後是6行。

# create representative example - you have this already 
time <- as.character(as.POSIXct("2015-09-06") + 60*(0:30)) 
temp = c(23.4, 23.4, 23.3, 23.2, 23.2, 23.1) 
humid = c(38.5, 38.3, 38.05, 38.1, 38.6, 38.6) 
db = c(38.834, 38.655, 38.679, 38.695, 38.806, 38.702) 
hz = c(191.41, 152.34, 162.11, 113.28, 121.09, 164.06) 
set.seed(123) # for reproducible example 
get.df <- function(n, name) { 
    df <- data.frame(min=sort(sample(time,n)), 
        temp=sample(temp,n, replace=TRUE), 
        humid=sample(humid,n,replace=TRUE), 
        db = sample(db,n,replace=TRUE), 
        hz = sample(hz,n,replace=TRUE)) 
    names(df) <- paste0(name,names(df)) 
    df 
} 
H1 <- get.df(20,"h1") # 20 rows at random times 
H2 <- get.df(20,"h2") # 20 rows at random times 
H3 <- get.df(25,"h3") # 25 rows at random times 
H4 <- get.df(30,"h4") # 30 rows at random times 
# you start here 
library(xts) 
lst <- list(H1, H2, H3, H4) 
xts.lst <- lapply(lst, function(df) xts(df[,2:ncol(df)], order.by=as.POSIXct(df[[1]]))) 
result <- do.call(merge.xts, c(xts.lst, all=FALSE)) 
result <- data.frame(result) 
head(result) 
#      h1temp h1humid h1db h1hz h2temp h2humid h2db h2hz h3temp h3humid h3db h3hz h4temp h4humid h4db h4hz 
# 2015-09-06 00:03:00 23.2 38.05 38.679 162.11 23.4 38.5 38.695 121.09 23.3 38.3 38.702 191.41 23.4 38.5 38.679 162.11 
# 2015-09-06 00:04:00 23.1 38.05 38.655 121.09 23.4 38.3 38.679 152.34 23.2 38.1 38.679 121.09 23.1 38.3 38.834 121.09 
# 2015-09-06 00:09:00 23.2 38.50 38.679 162.11 23.4 38.5 38.655 113.28 23.3 38.3 38.834 191.41 23.4 38.6 38.655 191.41 
# 2015-09-06 00:12:00 23.4 38.30 38.806 164.06 23.4 38.3 38.679 164.06 23.4 38.6 38.834 162.11 23.4 38.3 38.655 121.09 
# 2015-09-06 00:13:00 23.4 38.60 38.679 152.34 23.2 38.6 38.655 164.06 23.3 38.6 38.679 162.11 23.4 38.5 38.679 121.09 
# 2015-09-06 00:14:00 23.1 38.50 38.806 191.41 23.2 38.6 38.695 152.34 23.4 38.6 38.834 162.11 23.3 38.5 38.834 191.41 
+0

感謝您的迴應!我實際上更喜歡c(xts.lst,all = TRUE),因爲它顯示傳感器故障時的間隙。 – Evan

0
library(dplyr) 
library(magrittr) 
library(tidyr) 

H1_min = 
    data_frame(
    h1min = c("2015-09-06 00:00:00", "2015-09-06 00:02:00"), 
    h1temp = c(21.5, 21.5), 
    h1humid = c(73.10, 72.50), 
    h1db = c(39.252, 39.434), 
    h1hz = c(117.1900, 125.000)) 

H2_min = H1_min %>% mutate(h1hz = c(117.1900, NA)) 

answer = 
    list(H1_min, H2_min) %>% 
    lapply(. %>% setNames(c("min", 
          "temp", 
          "humid", 
          "db", 
          "hz"))) %>% 
    bind_rows(.id = "location") %>% 
    gather(variable, value, -location, -min) %>% 
    mutate(prefix = "h") %>% 
    unite(new_variable, prefix, location, variable, sep = "") %>% 
    spread(new_variable, value) %>% 
    filter(complete.cases(.)) 
相關問題