2015-12-08 32 views
1

我有下列排序基因數據的12700 X 307數據幀:在列的個別ID號碼,軌跡標識行(注意兩行每個基因座)改變交錯行中的數據幀

alist<-c("loci",185,186,187,188,189,190,191,"A549",1,1,1,1,1,1,1,"A549",0,0,1,1,1,0,1,"A588",1,1,1,1,1,1,1,"A588",0,0,0,0,0,0,1,"A794",1,1,1,1,1,1,1,"A794",1,0,1,0,1,1,0,"A081",1,1,1,1,1,1,0,"A081",1,1,1,1,1,1,1) 
df <- data.frame(matrix(unlist(alist), nrow=9, byrow=T),stringsAsFactors=FALSE) 
colnames(df) = df[1, ] 
df<-df[-1, ] 

我需要將其更改爲每個單獨兩行的數據框,每個軌跡一列。個體的第一行應該有第一個等位基因的存在/不存在條目,第二行存在該位置的第二個等位基因。

因此,它應該是這樣的:

blist<-c("individual","A549","A588","A794","A081","185",1,1,1,1,"185",0,0,1,1,"186",1,1,1,1,"186",0,0,0,1,"187",1,1,1,1,"187",1,0,1,1,"188",1,1,1,1,"188",1,0,0,1,"189",1,1,1,1,"189",1,0,1,1,"190",1,1,1,1,"190",0,0,1,1,"191",1,1,1,0,"191",1,1,0,1) 
dfb <- data.frame(matrix(unlist(blist), nrow=15, byrow=T),stringsAsFactors=FALSE) 
colnames(dfb) = dfb[1, ] 
dfb<-dfb[-1, ] 

它必須是相當做,能,但我沒有看到它。我會很感激任何想法。

回答

1

考慮使用的各種數據的管理程序此基礎R溶液。它是建立可擴展到實際的生產數據,如果你只是在列數參考改變8全長柱:

# TRANSPOSING DATA FRAME 
tdf <- as.data.frame(t(df[,-1])) 

# SETTING COLUMN NAMES 
names(tdf) <- as.list(df$loci)  
# SETTING INDIVIDUAL COLUMN 
tdf$individual <- rownames(tdf) 

# STACK SAME COLUMNS (CHANGE 8 TO NUMBER OF COLS(307)) 
finaldf <- rbind(tdf[, c(ncol(tdf), seq(1, 8, 2))], # EVEN COLS 
       tdf[, c(ncol(tdf), seq(2, 8, 2))]) # ODD COLS 

# ORDER BY INDIVIDUAL COLUMN 
finaldf <- finaldf[with(finaldf, order(individual)), ] 
rownames(finaldf) <- 1:nrow(finaldf) 

# CONVERT LOCI COLUMNS TO NUMERIC 
finaldf[,-1] <- sapply(sapply(finaldf[,-1], as.character), as.numeric) 

輸出

individual A549 A588 A794 A081 
1   185  1  1  1  1 
2   185  0  0  1  1 
3   186  1  1  1  1 
4   186  0  0  0  1 
5   187  1  1  1  1 
6   187  1  0  1  1 
7   188  1  1  1  1 
8   188  1  0  0  1 
9   189  1  1  1  1 
10  189  1  0  1  1 
11  190  1  1  1  1 
12  190  0  0  1  1 
13  191  1  1  1  0 
14  191  1  1  0  1 
1

以下是使用dplyrtidyr的方法。

它可以通過gather將您的數據轉換爲長格式。

Then it group_by s loci and individual,and mutates on row_number as you have repeated ids。

然後spread S回長在規定的方向,和select s出的行的列:

library(dplyr) 
library(tidyr) 
df %>% gather(individual, val, -loci) %>% 
     group_by(loci, individual) %>% 
     mutate(row = row_number()) %>% 
     spread(loci, val) %>% 
     select(-row) 

    individual A081 A549 A588 A794 
     (fctr) (chr) (chr) (chr) (chr) 
1   185  1  1  1  1 
2   185  1  0  0  1 
3   186  1  1  1  1 
4   186  1  0  0  0 
5   187  1  1  1  1 
6   187  1  1  0  1 
7   188  1  1  1  1 
8   188  1  1  0  0 
9   189  1  1  1  1 
10  189  1  1  0  1 
11  190  1  1  1  1 
12  190  1  0  0  1 
13  191  0  1  1  1 
14  191  1  1  1  0