2017-06-26 27 views
3

循環我有一個下面的函數for循環:需要幫助延伸的功能和用於R中

getSequences <- function(input.seq){ 
peptide.result <- c() 
for (i in 1:nrow(peptides.df)) { 
    peptide.seq <- substr(input.seq, peptides.df$StartAA[i], peptides.df$EndAA[i]) 
    peptide.info <- data.frame(cbind(peptide.name = peptides.df$Name[i], peptide.seq)) 
    peptide.result <- rbind(peptide.result, peptide.info) 
} 
    return(peptide.result) 
} 

test.results <- getSequences(input.seq) 

該函數的氨基酸序列,然後使用此輸入,並與起始肽的矩陣和停止位置,它在不同位置提取序列的子集以生成一組肽。 序列:

例如氨基酸序列:

input.seq <- ("MRKLYCVLLLSAFEFTYMINFGRGQNYWEHPYQNSDVYRPINEHREHPKEYEYPLHQEHTYQQE") 

這裏是peptides.df的前幾行的樣子:

Name StartAA EndAA 
peptide_1 25 48 
peptide_2 33 56 
peptide_3 41 64 

電流輸出peptide.result:

peptide.name peptide.sequence 
peptide_1 QNYWEHPYQNSDVYRPINEHREHP 
peptide_2 QNSDVYRPINEHREHPKEYEYPLH 
peptide_3 INEHREHPKEYEYPLHQEHTYQQE 

如何擴展它以獲取具有示例#和其中的數據幀放入序列。對於每個樣品#及其序列,我想生成一組肽,就像這個例子一樣。

新的輸入:與sample_sequences數據幀(200個採樣的輸入序列)

sample1  MRKLYCVLLLSAFEFTYMINFGRGQNYWEHPYQNSDVYRPINEHREHPKEYEYPLHQEHTYQQE 
sample2  MRKLYCVLLLSAFEFTYMINFGRGQNYWEHPYQNSDVYRPINEHREHPKEYEYPLHQEHTYQQE 
sample3  MRKLYCVLLLSAFEFTYMINFGRGQNYWEHPYQNSDVYRPINEHREHPKEYEYPLHQEHTYQQE 
... 
sample200 MRKLYCVLLLSAFEFTYMINFGRGQNYWEHPYQNSDVYRPINEHREHPKEYEYPLHQEHTYQQE 

新的輸出:sample_peptides

sample1 peptide_1 QNYWEHPYQNSDVYRPINEHREHP 
sample1 peptide_2 QNSDVYRPINEHREHPKEYEYPLH 
sample1 peptide_3 INEHREHPKEYEYPLHQEHTYQQE 
sample2 peptide_1 QNYWEHPYQNSDVYRPINEHREHP 
sample2 peptide_2 QNSDVYRPINEHREHPKEYEYPLH 
sample2 peptide_3 INEHREHPKEYEYPLHQEHTYQQE 
sample3 peptide_1 QNYWEHPYQNSDVYRPINEHREHP 
sample3 peptide_2 QNSDVYRPINEHREHPKEYEYPLH 
sample3 peptide_3 INEHREHPKEYEYPLHQEHTYQQE 
... 
sample200 peptide_1 QNYWEHPYQNSDVYRPINEHREHP 
sample200 peptide_2 QNSDVYRPINEHREHPKEYEYPLH 
sample200 peptide_3 INEHREHPKEYEYPLHQEHTYQQE 
+0

簡單mutate。 'for(s in sample_sequences){getSequences()}',對嗎? –

+1

'sapply(df $ sample_sequences,getSequences)'應該這樣做,儘管輸出格式會稍有不同。 –

+0

大家好,感謝您的幫助。我最終改變了解決問題的方法來滿足用戶需求。但是,我正在使用您的每個建議,因爲我正在嘗試學習R編程。我的新方法是根據用戶輸入的兩個座標(Coord1,Coord2)對輸入序列進行子集分類。庫(dplyr)subset.sample.seq < - sample_sequences%>%mutate(Sequence = subset(Sequence,Coord1,Coord2) – tkh86

回答

0

可以避免與tidyrdplyr環路。您可以使用crossing擴展所有可能肽段的sample_sequences。然後,它只是在你只是想在你的`getSequences`功能的另一個循環高水平的使用substr

library(dplyr);library(tidyr) 
peptides.df <- read.table(text=" Name StartAA EndAA 
peptide_1 25 48 
peptide_2 33 56 
peptide_3 41 64",header=TRUE,stringsAsFactors=FALSE) 

sample_sequences <-read.table(text=" sample sequence 
sample1  MRKLYCVLLLSAFEFTYMINFGRGQNYWEHPYQNSDVYRPINEHREHPKEYEYPLHQEHTYQQE 
sample2  MRKLYCVLLLSAFEFTYMINFGRGQNYWEHPYQNSDVYRPINEHREHPKEYEYPLHQEHTYQQE 
sample3  MRKLYCVLLLSAFEFTYMINFGRGQNYWEHPYQNSDVYRPINEHREHPKEYEYPLHQEHTYQQE",header=TRUE,stringsAsFactors=FALSE) 

crossing(sample_sequences,peptides.df)%>% 
    mutate(peptide.sequence=substr(sequence, StartAA, EndAA)) 

    sample               sequence  Name StartAA EndAA   peptide.sequence 
1 sample1 MRKLYCVLLLSAFEFTYMINFGRGQNYWEHPYQNSDVYRPINEHREHPKEYEYPLHQEHTYQQE peptide_1  25 48 QNYWEHPYQNSDVYRPINEHREHP 
2 sample1 MRKLYCVLLLSAFEFTYMINFGRGQNYWEHPYQNSDVYRPINEHREHPKEYEYPLHQEHTYQQE peptide_2  33 56 QNSDVYRPINEHREHPKEYEYPLH 
3 sample1 MRKLYCVLLLSAFEFTYMINFGRGQNYWEHPYQNSDVYRPINEHREHPKEYEYPLHQEHTYQQE peptide_3  41 64 INEHREHPKEYEYPLHQEHTYQQE 
4 sample2 MRKLYCVLLLSAFEFTYMINFGRGQNYWEHPYQNSDVYRPINEHREHPKEYEYPLHQEHTYQQE peptide_1  25 48 QNYWEHPYQNSDVYRPINEHREHP 
5 sample2 MRKLYCVLLLSAFEFTYMINFGRGQNYWEHPYQNSDVYRPINEHREHPKEYEYPLHQEHTYQQE peptide_2  33 56 QNSDVYRPINEHREHPKEYEYPLH 
6 sample2 MRKLYCVLLLSAFEFTYMINFGRGQNYWEHPYQNSDVYRPINEHREHPKEYEYPLHQEHTYQQE peptide_3  41 64 INEHREHPKEYEYPLHQEHTYQQE 
7 sample3 MRKLYCVLLLSAFEFTYMINFGRGQNYWEHPYQNSDVYRPINEHREHPKEYEYPLHQEHTYQQE peptide_1  25 48 QNYWEHPYQNSDVYRPINEHREHP 
8 sample3 MRKLYCVLLLSAFEFTYMINFGRGQNYWEHPYQNSDVYRPINEHREHPKEYEYPLHQEHTYQQE peptide_2  33 56 QNSDVYRPINEHREHPKEYEYPLH 
9 sample3 MRKLYCVLLLSAFEFTYMINFGRGQNYWEHPYQNSDVYRPINEHREHPKEYEYPLHQEHTYQQE peptide_3  41 64 INEHREHPKEYEYPLHQEHTYQQE