2016-12-22 50 views
-4

創建一個從R中的多個輸入dataframes多輸出dataframes我有n輸入dataframes,其中的每一個具有一個TimeStamp柱+ k數值列。在一般化的方式

我想將它們轉換爲k輸出dataframes他們每個人都會有一個TimeStamp柱+ n數值的列,從而使輸出數據幀j的數字列i將具有從輸入數據幀i的數值列j值(列索引不包括TimeStamp列,這是第一列),缺少的TimeStamp應該填充NAs。

第一列在這些dataframes總是TimeStamp柱(其中TimeStamp s的重疊),

在輸入dataframes行數是不同的(可能有不同的TimeStamp)。

例如,每個d1, d2n=2具有以下結構(一個樣本數據幀d1下方最低標準k=4k可以是任意的,但將是相同的每個數據幀)和它們中的每被存儲在單獨的CSV的dataframes的文件:

d1 <- structure(list(TimeStamp = structure(1:6, .Label = c("2016-12-20 10:17:20", "2016-12-20 10:19:20", "2016-12-20 10:19:40", "2016-12-20 10:20:00", "2016-12-20 10:20:20", "2016-12-20 10:20:40", "2016-12-20 10:21:00", 
"2016-12-20 10:21:20", "2016-12-20 10:21:40", "2016-12-20 10:22:00", 
"2016-12-20 10:22:20", "2016-12-20 10:22:40", "2016-12-20 10:23:00", 
"2016-12-20 10:23:20", "2016-12-20 10:23:40", "2016-12-20 10:24:00", 
"2016-12-20 10:24:20", "2016-12-20 10:24:40", "2016-12-20 10:25:00", 
"2016-12-20 10:25:20", "2016-12-20 10:25:40", "2016-12-20 10:26:00", 
"2016-12-20 10:26:20", "2016-12-20 10:26:40", "2016-12-20 10:27:00", 
"2016-12-20 10:27:20", "2016-12-20 10:27:40", "2016-12-20 10:28:00", 
"2016-12-20 10:28:20", "2016-12-20 10:28:40", "2016-12-20 10:29:00", 
"2016-12-20 10:29:20", "2016-12-20 10:29:40", "2016-12-20 10:30:00", 
"2016-12-20 10:30:20", "2016-12-20 10:30:40", "2016-12-20 10:31:00", 
"2016-12-20 10:31:20", "2016-12-20 10:31:40", "2016-12-20 10:32:00", 
"2016-12-20 10:32:20", "2016-12-20 10:32:40", "2016-12-20 10:33:00", 
"2016-12-20 10:33:20", "2016-12-20 10:33:40", "2016-12-20 10:34:00", 
"2016-12-20 10:34:20", "2016-12-20 10:34:40", "2016-12-20 10:35:00", 
"2016-12-20 10:35:20", "2016-12-20 10:35:40", "2016-12-20 10:36:00", 
"2016-12-20 10:37:00", "2016-12-20 10:37:20", "2016-12-20 10:37:40", 
"2016-12-20 10:38:00", "2016-12-20 10:38:20", "2016-12-20 10:40:40", 
"2016-12-20 10:41:20", "2016-12-20 10:41:40", "2016-12-20 10:44:20", 
"2016-12-20 10:44:40", "2016-12-20 10:46:00", "2016-12-20 10:49:40", 
"2016-12-20 10:50:00", "2016-12-20 10:50:20", "2016-12-20 10:55:00", 
"2016-12-20 10:56:00", "2016-12-20 10:57:20", "2016-12-20 10:59:20", 
"2016-12-20 10:59:40", "2016-12-20 11:00:20", "2016-12-20 11:01:20", 
"2016-12-20 11:05:40", "2016-12-20 11:06:00", "2016-12-20 11:07:20", 
"2016-12-20 11:08:20", "2016-12-20 11:08:40", "2016-12-20 11:11:40", 
"2016-12-20 11:12:00", "2016-12-20 11:14:20", "2016-12-20 11:14:40", 
"2016-12-20 11:15:00", "2016-12-20 11:15:20", "2016-12-20 11:15:40", 
"2016-12-20 11:16:00", "2016-12-20 11:16:20", "2016-12-20 11:18:20", 
"2016-12-20 11:18:40", "2016-12-20 11:19:00", "2016-12-20 11:19:20", 
"2016-12-20 11:19:40", "2016-12-20 11:21:20", "2016-12-20 11:21:40", 
"2016-12-20 11:22:20", "2016-12-20 11:22:40", "2016-12-20 11:23:00", 
"2016-12-20 11:23:20", "2016-12-20 11:25:00", "2016-12-20 11:25:20", 
"2016-12-20 11:26:00", "2016-12-20 11:26:40", "2016-12-20 11:27:00", 
"2016-12-20 11:27:20", "2016-12-20 11:27:40", "2016-12-20 11:28:00", 
"2016-12-20 11:28:20", "2016-12-20 11:28:40", "2016-12-20 11:34:40", 
"2016-12-20 11:36:20", "2016-12-20 11:36:40", "2016-12-20 11:41:00", 
"2016-12-20 11:41:20", "2016-12-20 11:42:20", "2016-12-20 11:42:40", 
"2016-12-20 11:46:40", "2016-12-20 11:47:00", "2016-12-20 11:47:20", 
"2016-12-20 11:47:40", "2016-12-20 11:48:00", "2016-12-20 11:48:20", 
"2016-12-20 11:48:40", "2016-12-20 11:54:00", "2016-12-20 11:54:20", 
"2016-12-20 11:57:40", "2016-12-20 12:00:00", "2016-12-20 12:00:40", 
"2016-12-20 12:01:00", "2016-12-20 12:01:20", "2016-12-20 12:01:40", 
"2016-12-20 12:02:20", "2016-12-20 12:02:40", "2016-12-20 12:03:00", 
"2016-12-20 12:03:20", "2016-12-20 12:03:40", "2016-12-20 12:07:00", 
"2016-12-20 12:07:20", "2016-12-20 12:07:40", "2016-12-20 12:08:00", 
"2016-12-20 12:08:20", "2016-12-20 12:10:20", "2016-12-20 12:10:40" 
), class = "factor"), b1 = c(-76L, 0L, 0L, -76L, -80L, -81L), 
    b2 = c(0L, -74L, -79L, -73L, -79L, -77L), b3 = c(0L, 0L, 
    -88L, -88L, -91L, 0L), b4 = c(0L, 0L, 0L, -78L, -80L, -78L 
    )), .Names = c("TimeStamp", "b1", "b2", "b3", "b4"), row.names = c(NA, 
6L), class = "data.frame") 

head(d1) 
#   TimeStamp b1 b2 b3 b4 
#1 2016-12-20 10:17:20 -76 0 0 0 
#2 2016-12-20 10:19:20 0 -74 0 0 
#3 2016-12-20 10:19:40 0 -79 -88 0 
#4 2016-12-20 10:20:00 -76 -73 -88 -78 
#5 2016-12-20 10:20:20 -80 -79 -91 -80 
#6 2016-12-20 10:20:40 -81 -77 0 -78 

d2 <- structure(list(TimeStamp = structure(137:142, .Label = c("2016-12-20 10:17:20", 
"2016-12-20 10:19:20", "2016-12-20 10:19:40", "2016-12-20 10:20:00", 
"2016-12-20 10:20:20", "2016-12-20 10:20:40", "2016-12-20 10:21:00", 
"2016-12-20 10:21:20", "2016-12-20 10:21:40", "2016-12-20 10:22:00", 
"2016-12-20 10:22:20", "2016-12-20 10:22:40", "2016-12-20 10:23:00", 
"2016-12-20 10:23:20", "2016-12-20 10:23:40", "2016-12-20 10:24:00", 
"2016-12-20 10:24:20", "2016-12-20 10:24:40", "2016-12-20 10:25:00", 
"2016-12-20 10:25:20", "2016-12-20 10:25:40", "2016-12-20 10:26:00", 
"2016-12-20 10:26:20", "2016-12-20 10:26:40", "2016-12-20 10:27:00", 
"2016-12-20 10:27:20", "2016-12-20 10:27:40", "2016-12-20 10:28:00", 
"2016-12-20 10:28:20", "2016-12-20 10:28:40", "2016-12-20 10:29:00", 
"2016-12-20 10:29:20", "2016-12-20 10:29:40", "2016-12-20 10:30:00", 
"2016-12-20 10:30:20", "2016-12-20 10:30:40", "2016-12-20 10:31:00", 
"2016-12-20 10:31:20", "2016-12-20 10:31:40", "2016-12-20 10:32:00", 
"2016-12-20 10:32:20", "2016-12-20 10:32:40", "2016-12-20 10:33:00", 
"2016-12-20 10:33:20", "2016-12-20 10:33:40", "2016-12-20 10:34:00", 
"2016-12-20 10:34:20", "2016-12-20 10:34:40", "2016-12-20 10:35:00", 
"2016-12-20 10:35:20", "2016-12-20 10:35:40", "2016-12-20 10:36:00", 
"2016-12-20 10:37:00", "2016-12-20 10:37:20", "2016-12-20 10:37:40", 
"2016-12-20 10:38:00", "2016-12-20 10:38:20", "2016-12-20 10:40:40", 
"2016-12-20 10:41:20", "2016-12-20 10:41:40", "2016-12-20 10:44:20", 
"2016-12-20 10:44:40", "2016-12-20 10:46:00", "2016-12-20 10:49:40", 
"2016-12-20 10:50:00", "2016-12-20 10:50:20", "2016-12-20 10:55:00", 
"2016-12-20 10:56:00", "2016-12-20 10:57:20", "2016-12-20 10:59:20", 
"2016-12-20 10:59:40", "2016-12-20 11:00:20", "2016-12-20 11:01:20", 
"2016-12-20 11:05:40", "2016-12-20 11:06:00", "2016-12-20 11:07:20", 
"2016-12-20 11:08:20", "2016-12-20 11:08:40", "2016-12-20 11:11:40", 
"2016-12-20 11:12:00", "2016-12-20 11:14:20", "2016-12-20 11:14:40", 
"2016-12-20 11:15:00", "2016-12-20 11:15:20", "2016-12-20 11:15:40", 
"2016-12-20 11:16:00", "2016-12-20 11:16:20", "2016-12-20 11:18:20", 
"2016-12-20 11:18:40", "2016-12-20 11:19:00", "2016-12-20 11:19:20", 
"2016-12-20 11:19:40", "2016-12-20 11:21:20", "2016-12-20 11:21:40", 
"2016-12-20 11:22:20", "2016-12-20 11:22:40", "2016-12-20 11:23:00", 
"2016-12-20 11:23:20", "2016-12-20 11:25:00", "2016-12-20 11:25:20", 
"2016-12-20 11:26:00", "2016-12-20 11:26:40", "2016-12-20 11:27:00", 
"2016-12-20 11:27:20", "2016-12-20 11:27:40", "2016-12-20 11:28:00", 
"2016-12-20 11:28:20", "2016-12-20 11:28:40", "2016-12-20 11:34:40", 
"2016-12-20 11:36:20", "2016-12-20 11:36:40", "2016-12-20 11:41:00", 
"2016-12-20 11:41:20", "2016-12-20 11:42:20", "2016-12-20 11:42:40", 
"2016-12-20 11:46:40", "2016-12-20 11:47:00", "2016-12-20 11:47:20", 
"2016-12-20 11:47:40", "2016-12-20 11:48:00", "2016-12-20 11:48:20", 
"2016-12-20 11:48:40", "2016-12-20 11:54:00", "2016-12-20 11:54:20", 
"2016-12-20 11:57:40", "2016-12-20 12:00:00", "2016-12-20 12:00:40", 
"2016-12-20 12:01:00", "2016-12-20 12:01:20", "2016-12-20 12:01:40", 
"2016-12-20 12:02:20", "2016-12-20 12:02:40", "2016-12-20 12:03:00", 
"2016-12-20 12:03:20", "2016-12-20 12:03:40", "2016-12-20 12:07:00", 
"2016-12-20 12:07:20", "2016-12-20 12:07:40", "2016-12-20 12:08:00", 
"2016-12-20 12:08:20", "2016-12-20 12:10:20", "2016-12-20 12:10:40" 
), class = "factor"), b1 = c(-76L, 0L, 0L, 0L, -82L, -74L), b2 = c(-87L, 
-76L, 0L, 0L, 0L, -69L), b3 = c(0L, 0L, -84L, -84L, 0L, -85L), 
    b4 = c(-75L, 0L, 0L, 0L, 0L, 0L)), .Names = c("TimeStamp", 
"b1", "b2", "b3", "b4"), row.names = c(NA, 6L), class = "data.frame") 

head(d2)  
#    TimeStamp b1 b2 b3 b4 
# 1 2016-12-20 12:07:20 -76 -87 0 -75 
# 2 2016-12-20 12:07:40 0 -76 0 0 
# 3 2016-12-20 12:08:00 0 0 -84 0 
# 4 2016-12-20 12:08:20 0 0 -84 0 
# 5 2016-12-20 12:10:20 -82 0 0 0 
# 6 2016-12-20 12:10:40 -74 -69 -85 0 

現在我想有k dataframes每一個與n列(保存爲單獨的CSV文件)。例如,我想從上面的輸入具有以下輸出dataframes b1, b2, b3, b4(其中兩個被示出)dataframes d1, d2如下:

b1  
    #   TimeStamp d1 d2 
    #2016-12-20 10:17:20 -76 NA 
    #2016-12-20 10:19:20 0 NA 
    #2016-12-20 10:19:40 0 NA 
    #2016-12-20 10:20:00 -76 NA 
    #2016-12-20 10:20:20 -80 NA 
    #2016-12-20 10:20:40 -81 NA 
    #2016-12-20 12:07:20 NA -76 
    #2016-12-20 12:07:40 NA 0 
    #2016-12-20 12:08:00 NA 0 
    #2016-12-20 12:08:20 NA 0 
    #2016-12-20 12:10:20 NA -82 
    #2016-12-20 12:10:40 NA -74 

    b2  
    #   TimeStamp d1 d2 
    #2016-12-20 10:17:20 0 NA 
    #2016-12-20 10:19:20 -74 NA 
    #2016-12-20 10:19:40 -79 NA 
    #2016-12-20 10:20:00 -73 NA 
    #2016-12-20 10:20:20 -79 NA 
    #2016-12-20 10:20:40 -77 NA 
    #2016-12-20 12:07:20 NA -87 
    #2016-12-20 12:07:40 NA -76 
    #2016-12-20 12:08:00 NA 0 
    #2016-12-20 12:08:20 NA 0 
    #2016-12-20 12:10:20 NA 0 
    #2016-12-20 12:10:40 NA -69 

在給定示例中的從不同dataframes時間戳是不相交的,但時間戳來自不同數據幀的一般將重疊,在後一種情況下,我們不需要由NAs填充(因爲數值將存在)。

什麼是最簡單,最有效和最普遍的方式來做到這一點(與base R/dplyr/tidyr/data.table,最好是沒有循環)?我可以有常量nk和數據幀任意大。

+1

也許像'地圖(data.frame,D1 = D1,D2 = D2)'? – Sotos

+0

獲取此錯誤:'錯誤in(函數(...,row.names = NULL,check.rows = FALSE,check.names = TRUE,:參數意味着不同的行數:185,142',因爲數據幀不合沒有完全相同的行數,而且我會更新我的帖子,問題會稍微複雜一些,因爲它也涉及時間戳。 –

+1

是的,您應該更新您的示例然後 – Sotos

回答

1

也許你可以試試這個:

#read d1 data from PATH1 
d1_df <- read.table("PATH1", header = T, sep = "\t", stringsAsFactors = F) 
#store d1 colnames 
d1_colname <- colnames(d1_df)[-1] 
#read d2 data from PATH2 
d2_df <- read.table("PATH2", header = T, sep = "\t", stringsAsFactors = F) 
#store d2 colnames 
d2_colname <- colnames(d2_df)[-1] 
#merge two df timestamp 
TimeStamp <-c(unlist(d1[,1]), unlist(d2[,1])) 
#merge two df colname 
merge_colname <- rbind(d1_colname, d2_colname) 
#to match the format want 
merge_df <- function(vec_colname){ 
    d1 <- c(unlist(d1_df[, vec_colname[1]]), rep("NA", nrow(d2_df))) 
    d2 <- c(rep("NA", nrow(d1_df)), unlist(d2_df[, vec_colname[2]])) 
    return(data.frame(TimeStamp, d1, d2)) 
} 
#get result,but is a list 
res_list <- apply(merge_colname, 2, merge_df) 
#create data frames from the result 
for(i in 1:length(res_list)){ 
    #bi <- res_list[[i]] 
    eval(parse(text=paste0("b",i,"<-res_list[[",i,"]]"))) 
} 

而結果:

> b1 
      TimeStamp d1 d2 
1 2016-12-20 10:17:20 -76 NA 
2 2016-12-20 10:19:20 0 NA 
3 2016-12-20 10:19:40 0 NA 
4 2016-12-20 10:20:00 -76 NA 
5 2016-12-20 10:20:20 -80 NA 
6 2016-12-20 10:20:40 -81 NA 
7 2016-12-20 12:07:20 NA -76 
8 2016-12-20 12:07:40 NA 0 
9 2016-12-20 12:08:00 NA 0 
10 2016-12-20 12:08:20 NA 0 
11 2016-12-20 12:10:20 NA -82 
12 2016-12-20 12:10:40 NA -74 
+0

請參閱更新的帖子以考慮時間戳。 –

+0

@sandipan我已經更新了我的答案,你可以檢查它。 –

+0

@docendodiscimus對不起,我將編輯我的代碼。 –