2017-09-13 221 views
-2

我有兩列不同的data.frame對象。這些data.frame對象稱爲Experiment1,Experiment2,Experiment3 ...實驗{N}操縱Data.Frames

> Experiment1 Name Statistic 1 a -1.050 2 b 0.058 3 c 0.489 4 d 1.153 5 e 0.736 6 f -1.155 7 g 0.186

> Experiment2 Name Statistic 1 a 0.266 2 b 0.067 3 c -0.385 4 d 0.068 5 e 1.563 6 f 0.745 7 g 1.671

> Experiment3 Name Statistic 1 a 0.004 2 b -2.074 3 c 0.746 4 d 0.207 5 e 0.700 6 f 0.158 7 g 0.067

> Experiment4 Name Statistic 1 a 0.255 2 b -0.542 3 c 0.477 4 d 1.552 5 e 0.025 6 f 1.027 7 g 0.326

> Experiment5 Name Statistic 1 a 1.817 2 b 0.147 3 c 0.052 4 d 0.194 5 e -0.137 6 f 2.321 7 g -0.939

> Experiment6 Name Statistic 1 a 1.817 2 b 0.147 3 c 0.052 4 d 0.194 5 e -0.137 6 f 2.321 7 g -0.939

> ExperimentalDesign$metabolite [1] "butyrate" "h2s" "hippurate" "acetate" "propionate" "butyrate_2" [7] "h2s_2" "hippurate_2" "acetate_2" "propionate_2"

我有三列不同data.frame對象。這些data.frame對象稱爲Experiment1,Experiment2,Experiment3 ... Experiment {n}(其中n是NumberTubes除以NumberParameters)。我想從每個data.frame對象合併表中的$。統計列(每個輸出有3個統計列)。 tab_1 < - cbind(Experiment1,Experiment2 $ Statistic,Experiment3 $ Statistic)。另外,從ExperimentalDesign $代謝物中依次取代謝物。例如Table_3會得到hippurate。

  1. NumberRepeats < - 3(TABLE_1 =合併Experiment_1, Experiment_2 $統計,Experiment_3 $統計,TABLE_2 =合併 Experiment_4,Experiment_5 $統計,Experiment_6 $統計等)
  2. Experiment_n < - 17(例如Experiment_1,Experiment_2等。)
  3. skipTube < - C(11)(跳過Experiment_11)

希望的輸出:

TABLE_1: Experiment1 Experiment2 Experiment3 metabolite a -1.050 0.266 0.004 butyrate b 0.058 0.067 -2.074 butyrate c 0.489 -0.385 0.746 butyrate d 1.153 0.068 0.207 butyrate e 0.736 1.563 0.700 butyrate f -1.155 0.745 0.158 butyrate g 0.186 1.671 0.067 butyrate

TABLE_2

Experiment4 Experiment5 Experiment6 metabolite a 0.255 1.817 -0.827 h2s b -0.542 0.147 0.219 h2s c 0.477 0.052 1.561 h2s d 1.552 0.194 1.493 h2s e 0.025 -0.137 0.063 h2s f 1.027 2.321 0.844 h2s g 0.326 -0.939 -0.373 h2s

受審至今:

有了這個,你在不同的數據框對象的列合併到一個表。您可以通過NumberRepeats變量來控制列的數量。所有存儲在列表中的表具有相同數量的數據列,如
NumberRepeats變量除最後一個表以外...

# created test data 
for(i in 1:17){ 
    Name <- letters[1:7] 
    Statistic <- round(rnorm(7), 3) 
    assign(paste0("Experiment",i), data.frame(Name, Statistic)) 
}  

# set some parameters 
NumberRepeats <- 3 
Experiment_n <- 17 
skipTube <- c(11) 

#讓從上述代碼去

out <- list() 
list_index <- 1 
counter <- 1 
while(counter < Experiment_n) { 

    tab <- NULL 
    nam <- NULL 
    while((is.null(tab) || ncol(tab) < NumberRepeats) & Experiment_n >= counter){ 
    if(!any(counter == skipTube)){ 
     tab <- cbind(tab, get(paste0("Experiment", counter))$Statistic) 
     # tab <- as.data.frame(tab) 
     nam <- c(nam,paste0("Experiment", counter)) 
    } 
    counter <- counter + 1 
    } 
    colnames(tab) <- nam 
    rownames(tab) <- as.matrix(Experiment1$Name) 

    out[[list_index]] <- tab 
    assign(paste0('table_', list_index), tab) 

    list_index <- list_index + 1 
} 
out 

輸出:

Experiment1 Experiment2 Experiment3 a 0.136 0.260 -1.089
b 0.946 -1.165 -0.599
c -0.462 -1.445 0.044
d -1.936 -0.391 0.622
e 0.537 -0.502 1.192
f 0.259 0.096 -1.873
g 1.352 0.049 -0.644

從上面的代碼所需的輸出

Experiment1 Experiment2 Experiment3 metabolite a -1.050 0.266 0.004 butyrate b 0.058 0.067 -2.074 butyrate c 0.489 -0.385 0.746 butyrate d 1.153 0.068 0.207 butyrate e 0.736 1.563 0.700 butyrate f -1.155 0.745 0.158 butyrate g 0.186 1.671 0.067 butyrate

+3

這是有益的,如果你可以提供一個[再現的示例](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example),與兩個您的數據樣本和代碼,用於您迄今爲止所嘗試的內容。 – austensen

+1

你說三列,但只顯示兩列,還有什麼?您引用了'NumberParameters',但從未演示過它的用法;這應該是'NumberRepeats'?我強烈建議使用[數據框列表](https://stackoverflow.com/questions/17499013/how-do-i-make-a-list-of-data-frames/24376207#24376207)而不是訪問個人帶有'assign'的幀,通常更容易,更高效,更健壯。 – r2evans

+0

我已經更新了這個問題來回答這個問題 –

回答

1

像這樣的東西應該工作,但這個也挺手冊:

table1 = Reduce(function(x,y){cbind(x,y)}, 
list(Experiment1$Statistic,Experiment2$Statistic, 
Experiment3$Statistic,ExperimentalDesign$metabolite[1])) 

table2 = Reduce(function(x,y){cbind(x,y)}, 
list(Experiment4$Statistic,Experiment5$Statistic, 
Experiment6$Statistic,ExperimentalDesign$metabolite[2])) 

編輯:一個更強大的解決方案:

首先創建一個名爲ldf所有實驗data.frames的列表:

ldf = list(Experiment1,Experiment2,Experiment3,...,Experimentn) 

然後:

lapply(1:ceiling(length(ldf)/3), 
    function(t,l,df){ 
    if(t==ceiling(length(l)/3)){ 
     ind = ((3*t)-2):(3*t-(length(l)%%3)) 
    }else{ 
     ind = ((3*t)-2):(3*t) 
    }; 
    cbind(Reduce(function(x,y){cbind(x,y)},lapply(l[ind],'[[','Statistic')), 
    df$metabolite[t]) 
    }, 
ldf,ExperimentalDesign) 
+0

@J_Throat請檢查更新的解決方案。 – TUSHAr

0

如果您想要聚合每3個表格,此解決方案應該做你想要的。

library(reshape) 

for(i in 1:17){ 
    Name <- letters[1:7] 
    Statistic <- round(rnorm(7), 3) 
    ExperimentName <- rep(paste0("Experiment",i), 7) 
    assign(paste0("Experiment",i), data.frame(ExperimentName, Name, Statistic, stringsAsFactors = FALSE)) 
}  

# set some parameters 
NumberRepeats <- 5 
Experiment_n <- 17 
skipTube <- c(3,7,11) 

# Create dummy list for the metabolites 
metabolites <- c("met1", "met2", "met3", "met4", "met5") 

for (iteration in c(1:Experiment_n)){ 
    if (iteration %% 3 == 0){ 
    temp_df <- rbind(get(paste0("Experiment", iteration - 2)), get(paste0("Experiment", iteration - 1)), get(paste0("Experiment", iteration))) 
    print(temp_df) 
    temp_df <- melt(data = temp_df) 
    aggregates <- dcast(data = temp_df, formula = Name ~ ExperimentName, value.var = "value") 
    aggregates$metabolite <- metabolites[iteration/3] 
    print(aggregates) 
    } 
}