2014-01-13 124 views
1

我讀取了所有'字符串到變量名稱'的帖子,但沒有一個涉及我的特殊問題。我有一個使用seqinr軟件包'read.fasta'製作的載體列表(DNA序列數據)。我有變體的數據框及其位置,我想將數據框中指定位置的列表向量元素轉換爲其替代值。以個人爲基礎這可以使用R將字符串轉換爲變量名稱並替換變量

list$name[number] <- alternate.character 

# I tried 
for (i in 1:length(df$CHROM)) 
    if (is.na(df$Call[i])) {next} else {get(paste("test$",df$CHROM[i],"[",df$POS[i],"]",sep="")) <- df$Call[i]} 

# example data 
test <- list("One" = c("a","t","a","g","c"), 
       "Two" = c("g","a","t","t","a","c","a")) 
df <- data.frame("CHROM"=c(rep("One",2),rep("Two",3)), 
      "POS" = c(2,4,1,3,6), 
      "REF" = c("t","g","g","t","c"), 
      "ALT" = c("a","a","t","g","t"), 
      "Call" = c("T","A","G",NA,"T")) 

做到,但「得到」從列表返回向量元素,並且不允許我給它指定爲在父列表中的變量。

我要的是單從

$One 
[1] "a" "t" "a" "g" "c" 

$Two 
[1] "g" "a" "t" "t" "a" "c" "a" 

$One 
[1] "a" "T" "a" "A" "c" 

$Two 
[1] "G" "a" "t" "t" "a" "T" "a" 

對於測試數據,因爲你可以做到這一點單獨這不是一個問題,但真實數據超過10,000個序列和超過100,000個變體。如果您可以對它進行矢量化,則可獲得獎勵點數,但我沒有足夠的經驗嵌套應用函數,以使其能夠同時處理來自列表和數據框的信息。

sessionInfo() 
R version 3.0.2 (2013-09-25) 
Platform: x86_64-pc-linux-gnu (64-bit) 

locale: 
[1] LC_CTYPE=en_GB.UTF-8  LC_NUMERIC=C    
[3] LC_TIME=en_GB.UTF-8  LC_COLLATE=en_GB.UTF-8  
[5] LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8 
[7] LC_PAPER=en_GB.UTF-8  LC_NAME=C     
[9] LC_ADDRESS=C    LC_TELEPHONE=C    
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C  

attached base packages: 
[1] stats  graphics grDevices utils  datasets methods base  

other attached packages: 
[1] seqinr_3.0-7 

loaded via a namespace (and not attached): 
[1] tools_3.0.2 
+0

是REF和ALT列這裏無關緊要? – Spacedman

+0

實際上,對於這部分對他們來說是不相干的,他們之前被用來獲得'Call'字符。 – JeremyS

+0

繼續你的方法看起來像'for(我在seq_len(nrow(df))){if(!is.na(as.character(df $ Call [i])))test [[as.character(df $ CHROM [i])]] [as.numeric(as.character(df $ POS [i]))] < - as.character(df $ Call [i])}'; '測試' –

回答

1

您可以使用sapply此任務:

res <- sapply(names(test), function(x) { 
    tmp <- df[df$CHROM == x & !is.na(df$Call), ] 
    replace(test[[x]], tmp$POS, as.character(tmp$Call)) 
}) 


res 
# $One 
# [1] "a" "T" "a" "A" "c" 
# 
# $Two 
# [1] "G" "a" "t" "t" "a" "T" "a" 
+0

這正是我想要的。它看起來很簡單。謝謝。 – JeremyS