2016-01-25 58 views
2

我有許多測序實驗,每個測試實驗都有幾百個基因中的每一個的多個結果,當數據從另一個程序輸出時,它對我來說沒有有用的格式因爲所有實驗和每個結果都列在頂部,每個基因都有一行。我寫了一個示例數據集,以及我目前如何解決這個問題的例子,但是我想要一個更優化的方法,因爲我的數據集非常大。操縱一個數據幀,其中每個實驗有多個柱子

col1<- c("","", "gene1", "gene2", "gene3", "gene4") 
col2<- c("Experiment1", "Part 1", "a","b","c","d") 
col3<- c("Experiment1", "Part 2", "e", "f", "g", "h") 
col4<- c("Experiment2", "Part 1", "i", "j", "k", "l") 
col5<- c("Experiment2", "Part 2", "m", "n", "o", "p") 
pp<- data.frame(col1,col2,col3,col4,col5) 
one<-data.frame(pp$col1, pp$col2) 
onetwo<- data.frame(pp$col1,pp$col3) 
two<-data.frame(pp$col1, pp$col4) 
twotwo<-data.frame(pp$col1,pp$col5) 

one$V3[3:6]<-as.character(one[2,2]) 
one<-one[-2,] 
one<-one[-1,] 
colnames(one)<- c("gene", "Experiment 1", "part") 

onetwo$V3[3:6]<-as.character(onetwo[2,2]) 
onetwo<-onetwo[-2,] 
onetwo<-onetwo[-1,] 
colnames(onetwo)<- c("gene", "Experiment 1", "part") 

x1<-rbind(one, onetwo) 

two$V3[3:6]<-as.character(two[2,2]) 
two<-two[-2,] 
two<-two[-1,] 
colnames(two)<- c("gene", "Experiment 2", "part") 


twotwo$V3[3:6]<-as.character(twotwo[2,2]) 
twotwo<-twotwo[-2,] 
twotwo<-twotwo[-1,] 
colnames(twotwo)<- c("gene", "Experiment 2", "part") 

x2<-rbind(two, twotwo) 

x3<-merge(x1,x2) 

我爲大量的代碼表示歉意,但我無法專門用語言表達此操作。 pp是示例數據框,x3是我需要的格式。有一個更好的方法嗎?

回答

0

這可能是一個更短的方式做到這一點:這給

pp.new <- as.data.frame(t(pp)[-1,], row.names = 1) 
names(pp.new) <- c("experiment", "part", "gene1", "gene2", "gene3", "gene4") 

> pp.new 
    experiment part gene1 gene2 gene3 gene4 
1 Experiment1 Part 1  a  b  c  d 
2 Experiment1 Part 2  e  f  g  h 
3 Experiment2 Part 1  i  j  k  l 
4 Experiment2 Part 2  m  n  o  p 

但是,它可能是更好地與reshape2包變身爲長格式如下:

library(reshape2)  
pp.long <- melt(pp.new, id=c("experiment","part")) 

其結果是:

> pp.long 
    experiment part variable value 
1 Experiment1 Part 1 gene1  a 
2 Experiment1 Part 2 gene1  e 
3 Experiment2 Part 1 gene1  i 
4 Experiment2 Part 2 gene1  m 
5 Experiment1 Part 1 gene2  b 
6 Experiment1 Part 2 gene2  f 
7 Experiment2 Part 1 gene2  j 
8 Experiment2 Part 2 gene2  n 
9 Experiment1 Part 1 gene3  c 
10 Experiment1 Part 2 gene3  g 
11 Experiment2 Part 1 gene3  k 
12 Experiment2 Part 2 gene3  o 
13 Experiment1 Part 1 gene4  d 
14 Experiment1 Part 2 gene4  h 
15 Experiment2 Part 1 gene4  l 
16 Experiment2 Part 2 gene4  p 

如果你想獲得一個compareable輸出爲,您可以使用recast功能(也從reshape2包):

recast(pp.new, part + variable ~ experiment, id.var=c("experiment","part"), value.var = "value") 

這給:

part variable Experiment1 Experiment2 
1 Part 1 gene1   a   i 
2 Part 1 gene2   b   j 
3 Part 1 gene3   c   k 
4 Part 1 gene4   d   l 
5 Part 2 gene1   e   m 
6 Part 2 gene2   f   n 
7 Part 2 gene3   g   o 
8 Part 2 gene4   h   p 
相關問題