下面是一個data.table解決方案,每行註釋。它被寫爲處理mymat
對象中的任意數量的列。我將簡要解釋一下:
1)首先,我們轉換爲data.table格式,我們可以處理任意數量的列,假設它的格式類似。
2)我們找到所有「.GT」列並提取「.GT」之前的數字。
3)我們爲找到的每個「.GT」列創建「.DP」列。
4)我們通過創建映射的「to」部分的向量來開發「GT」到「AD」映射。 「from」部分作爲名稱存儲在向量中。
5)使用data.table中的.SDcols特性將「GT」應用於「AD」映射,並創建「AD」列。
# Your matrix
mymat <- structure(c("0/1", "1/1", "0/0", "0/0"), .Dim = c(2L, 2L),
.Dimnames = list(c("chr1:1163804", "chr1:1888193"),
c("00860.GT", "00861.GT")))
# Using a data table approach
library(data.table)
# Casting to data table - row.names will be converted to a column called 'rn'.
mymat = as.data.table(mymat, keep.rownames = T)
# Find "GT" columns
GTcols = grep("GT", colnames(mymat))
# Get number before ".GT"
selectedCols = gsub(".GT", "", colnames(mymat)[GTcols])
selectedCols
[1] "00860" "00861"
# Create ".DP" columns
mymat[, paste0(selectedCols, ".DP") := 50, with = F]
mymat
rn 00860.GT 00861.GT 00860.DP 00861.DP
1: chr1:1163804 0/1 0/0 50 50
2: chr1:1888193 1/1 0/0 50 50
# Create "GT" to "AD" mapping
GTToADMapping = c("50,0", "25/25", "0/50")
names(GTToADMapping) = c("0/0", "0/1", "1/1")
GTToADMapping
0/0 0/1 1/1
"50,0" "25/25" "0/50"
# This function will return the "AD" mapping given the values of "GT"
mapGTToAD <- function(x){
return (GTToADMapping[x])
}
# Here, we create the AD columns using the GT mapping
mymat[, (paste0(selectedCols, ".AD")) := lapply(.SD, mapGTToAD), with = F,
.SDcols = colnames(mymat)[GTcols]]
rn 00860.GT 00861.GT 00860.DP 00861.DP 00860.AD 00861.AD
1: chr1:1163804 0/1 0/0 50 50 25/25 50,0
2: chr1:1888193 1/1 0/0 50 50 0/50 50,0
# We can sort the data now as you have it
colOrder = as.vector(rbind(paste0(selectedCols, ".GT"),
paste0(selectedCols, ".AD"),
paste0(selectedCols, ".DP")))
mymat = mymat[, c("rn", colOrder), with = F]
mymat
rn 00860.GT 00860.AD 00860.DP 00861.GT 00861.AD 00861.DP
1: chr1:1163804 0/1 25/25 50 0/0 50,0 50
2: chr1:1888193 1/1 0/50 50 0/0 50,0 50
# Put it back in the format you had
mymat2 = as.matrix(mymat[,-1, with = F])
rownames(mymat2) = mymat$rn
mymat2
00860.GT 00860.AD 00860.DP 00861.GT 00861.AD 00861.DP
chr1:1163804 "0/1" "25/25" "50" "0/0" "50,0" "50"
chr1:1888193 "1/1" "0/50" "50" "0/0" "50,0" "50"
您真的需要處理多少個樣本(列)和chr1(行)? – aichao
@aichao它像2000個樣本列和超過100000行。 – MAPK