需要幫助延伸的功能和用於R中

循環我有一個下面的函數for循環：需要幫助延伸的功能和用於R中

getSequences <- function(input.seq){ 
peptide.result <- c() 
for (i in 1:nrow(peptides.df)) { 
    peptide.seq <- substr(input.seq, peptides.df$StartAA[i], peptides.df$EndAA[i]) 
    peptide.info <- data.frame(cbind(peptide.name = peptides.df$Name[i], peptide.seq)) 
    peptide.result <- rbind(peptide.result, peptide.info) 
} 
    return(peptide.result) 
} 

test.results <- getSequences(input.seq)

該函數的氨基酸序列，然後使用此輸入，並與起始肽的矩陣和停止位置，它在不同位置提取序列的子集以生成一組肽。序列：

例如氨基酸序列：

input.seq <- ("MRKLYCVLLLSAFEFTYMINFGRGQNYWEHPYQNSDVYRPINEHREHPKEYEYPLHQEHTYQQE")

這裏是peptides.df的前幾行的樣子：

Name StartAA EndAA 
peptide_1 25 48 
peptide_2 33 56 
peptide_3 41 64

電流輸出peptide.result：

peptide.name peptide.sequence 
peptide_1 QNYWEHPYQNSDVYRPINEHREHP 
peptide_2 QNSDVYRPINEHREHPKEYEYPLH 
peptide_3 INEHREHPKEYEYPLHQEHTYQQE

如何擴展它以獲取具有示例＃和其中的數據幀放入序列。對於每個樣品＃及其序列，我想生成一組肽，就像這個例子一樣。

新的輸入：與sample_sequences數據幀（200個採樣的輸入序列）

sample1  MRKLYCVLLLSAFEFTYMINFGRGQNYWEHPYQNSDVYRPINEHREHPKEYEYPLHQEHTYQQE 
sample2  MRKLYCVLLLSAFEFTYMINFGRGQNYWEHPYQNSDVYRPINEHREHPKEYEYPLHQEHTYQQE 
sample3  MRKLYCVLLLSAFEFTYMINFGRGQNYWEHPYQNSDVYRPINEHREHPKEYEYPLHQEHTYQQE 
... 
sample200 MRKLYCVLLLSAFEFTYMINFGRGQNYWEHPYQNSDVYRPINEHREHPKEYEYPLHQEHTYQQE

新的輸出：sample_peptides

sample1 peptide_1 QNYWEHPYQNSDVYRPINEHREHP 
sample1 peptide_2 QNSDVYRPINEHREHPKEYEYPLH 
sample1 peptide_3 INEHREHPKEYEYPLHQEHTYQQE 
sample2 peptide_1 QNYWEHPYQNSDVYRPINEHREHP 
sample2 peptide_2 QNSDVYRPINEHREHPKEYEYPLH 
sample2 peptide_3 INEHREHPKEYEYPLHQEHTYQQE 
sample3 peptide_1 QNYWEHPYQNSDVYRPINEHREHP 
sample3 peptide_2 QNSDVYRPINEHREHPKEYEYPLH 
sample3 peptide_3 INEHREHPKEYEYPLHQEHTYQQE 
... 
sample200 peptide_1 QNYWEHPYQNSDVYRPINEHREHP 
sample200 peptide_2 QNSDVYRPINEHREHPKEYEYPLH 
sample200 peptide_3 INEHREHPKEYEYPLHQEHTYQQE

來源

2017-06-26 tkh86

簡單mutate。 'for（s in sample_sequences）{getSequences（）}'，對嗎？ –

'sapply（df $ sample_sequences，getSequences）'應該這樣做，儘管輸出格式會稍有不同。 –

大家好，感謝您的幫助。我最終改變了解決問題的方法來滿足用戶需求。但是，我正在使用您的每個建議，因爲我正在嘗試學習R編程。我的新方法是根據用戶輸入的兩個座標（Coord1，Coord2）對輸入序列進行子集分類。庫（dplyr）subset.sample.seq < - sample_sequences％>％mutate（Sequence = subset（Sequence，Coord1，Coord2） – tkh86

可以避免與tidyr和dplyr環路。您可以使用crossing擴展所有可能肽段的sample_sequences。然後，它只是在你只是想在你的`getSequences`功能的另一個循環高水平的使用substr

library(dplyr);library(tidyr) 
peptides.df <- read.table(text=" Name StartAA EndAA 
peptide_1 25 48 
peptide_2 33 56 
peptide_3 41 64",header=TRUE,stringsAsFactors=FALSE) 

sample_sequences <-read.table(text=" sample sequence 
sample1  MRKLYCVLLLSAFEFTYMINFGRGQNYWEHPYQNSDVYRPINEHREHPKEYEYPLHQEHTYQQE 
sample2  MRKLYCVLLLSAFEFTYMINFGRGQNYWEHPYQNSDVYRPINEHREHPKEYEYPLHQEHTYQQE 
sample3  MRKLYCVLLLSAFEFTYMINFGRGQNYWEHPYQNSDVYRPINEHREHPKEYEYPLHQEHTYQQE",header=TRUE,stringsAsFactors=FALSE) 

crossing(sample_sequences,peptides.df)%>% 
    mutate(peptide.sequence=substr(sequence, StartAA, EndAA)) 

    sample               sequence  Name StartAA EndAA   peptide.sequence 
1 sample1 MRKLYCVLLLSAFEFTYMINFGRGQNYWEHPYQNSDVYRPINEHREHPKEYEYPLHQEHTYQQE peptide_1  25 48 QNYWEHPYQNSDVYRPINEHREHP 
2 sample1 MRKLYCVLLLSAFEFTYMINFGRGQNYWEHPYQNSDVYRPINEHREHPKEYEYPLHQEHTYQQE peptide_2  33 56 QNSDVYRPINEHREHPKEYEYPLH 
3 sample1 MRKLYCVLLLSAFEFTYMINFGRGQNYWEHPYQNSDVYRPINEHREHPKEYEYPLHQEHTYQQE peptide_3  41 64 INEHREHPKEYEYPLHQEHTYQQE 
4 sample2 MRKLYCVLLLSAFEFTYMINFGRGQNYWEHPYQNSDVYRPINEHREHPKEYEYPLHQEHTYQQE peptide_1  25 48 QNYWEHPYQNSDVYRPINEHREHP 
5 sample2 MRKLYCVLLLSAFEFTYMINFGRGQNYWEHPYQNSDVYRPINEHREHPKEYEYPLHQEHTYQQE peptide_2  33 56 QNSDVYRPINEHREHPKEYEYPLH 
6 sample2 MRKLYCVLLLSAFEFTYMINFGRGQNYWEHPYQNSDVYRPINEHREHPKEYEYPLHQEHTYQQE peptide_3  41 64 INEHREHPKEYEYPLHQEHTYQQE 
7 sample3 MRKLYCVLLLSAFEFTYMINFGRGQNYWEHPYQNSDVYRPINEHREHPKEYEYPLHQEHTYQQE peptide_1  25 48 QNYWEHPYQNSDVYRPINEHREHP 
8 sample3 MRKLYCVLLLSAFEFTYMINFGRGQNYWEHPYQNSDVYRPINEHREHPKEYEYPLHQEHTYQQE peptide_2  33 56 QNSDVYRPINEHREHPKEYEYPLH 
9 sample3 MRKLYCVLLLSAFEFTYMINFGRGQNYWEHPYQNSDVYRPINEHREHPKEYEYPLHQEHTYQQE peptide_3  41 64 INEHREHPKEYEYPLHQEHTYQQE

來源

2017-06-26 14:02:40

需要幫助延伸的功能和用於R中

回答

相關問題