2012-05-28 54 views
1

我創建了以下矩陣R的具體子:得到一個序列

positions = cbind(seq(from = 20, to = 68, by = 4),seq(from = 22, to = 70, by = 4)) 

我也有以下字符串:

"SEQRES 1 L 36 THR PHE GLY SER GLY GLU ALA ASP CYS GLY LEU ARG PRO   " 

我試圖使用應用功能製作第一個索引來自位置[,1],第二個來自位置[,2]的子字符串列表(mystring,start.position,end.position)。我可以使用for循環輕鬆完成此操作,但我認爲應用會更快。

我能得到它的工作如下,但我不知道是否有一個更清潔的方式:

parse.me = cbind(seq(from = 20, to = 68, by = 4),seq(from = 22, to = 70, by = 4), input) 
apply(parse.me, MARGIN = 1, get.AA.seqres) 

get.AA.seqres <- function(items){ 
start.position = as.numeric(items[1]) 
end.position = as.numeric(items[2]) 
string = items[3] 
return (substr(string, start.position, end.position) ) 
} 
+1

你爲什麼不分配空白空間並丟棄前三個元素? – Andrie

+0

PDB文件元素由不是由空白的列定義。因此,當規範特別提及列數時,我很猶豫是否會將空白分割出來。雖然感謝雖然! – user1357015

回答

3

試試這個:

> substring(input, positions[, 1], positions[, 2]) 
[1] "THR" "PHE" "GLY" "SER" "GLY" "GLU" "ALA" "ASP" "CYS" "GLY" "LEU" "ARG" "PRO" 
0

我喜歡Andrie的切實可行的建議,但如果你需要走這條路線的一些其他原因,你的問題聽起來像它可以通過Vectorize()解決:

#Your data 
positions = cbind(seq(from = 20, to = 68, by = 4),seq(from = 22, to = 70, by = 4)) 
input <- "SEQRES 1 L 36 THR PHE GLY SER GLY GLU ALA ASP CYS GLY LEU ARG PRO   " 

#Vectorize the function substr() 
vsubstr <- Vectorize(substr, USE.NAMES = FALSE) 
vsubstr(input, positions[,1], positions[,2]) 
#----- 
[1] "THR" "PHE" "GLY" "SER" "GLY" "GLU" "ALA" "ASP" "CYS" "GLY" "LEU" "ARG" "PRO" 

#Or, read the help page on ?substr about the bit for recycling in the first paragraph of details 

substr(rep(input, nrow(positions)), positions[,1], positions[,2]) 
#----- 
[1] "THR" "PHE" "GLY" "SER" "GLY" "GLU" "ALA" "ASP" "CYS" "GLY" "LEU" "ARG" "PRO"