我試圖編寫一個簡單的R函數來跨單個數據框的兩列對5個元素的子字符串進行採樣。字符串的長度對於每一行都是相等的,但是它們在列的下方是不同的。該函數在我指定行和列作用時起作用,但我無法獲取應用語句在每行和每列上處理。正如所寫,它只會根據第一個實例的長度抽取隨機樣本,因此如果第一個實例比其他任何字符串短,其他行的輸出有時少於5個元素。將聲明應用於採樣列,跨不同長度的行
例如DF:
BP TF
1 CGTCTCTATTCTAGGCAAGA TTTFFFFTFFFTFFFTFTTT
2 AAGTCACTCGAATTCGGATGCCCCCTAGGC TTFFFFFTFFFFTTFTFFTTTFTTTTFTFF
3 TGCTCATGACGGGAC FFFTFTFFFFTFTFT
'預期輸出:'
1 CTATT FFTFF
2 CCTAG TTTFT
3 TCATG TFTFF
'重現的示例代碼:'
#make fake data frame
BaseP1 <- paste(sample(size = 20, x = c("A","C","T","G"), replace = TRUE), collapse = "")
BaseP2 <- paste(sample(size = 30, x = c("A","C","T","G"), replace = TRUE), collapse = "")
BaseP3 <- paste(sample(size = 15, x = c("A","C","T","G"), replace = TRUE), collapse = "")
TrueFalse1 <- paste(sample(size = 20, x = c("T","F"), replace = TRUE), collapse = "")
TrueFalse2 <- paste(sample(size = 30, x = c("T","F"), replace = TRUE), collapse = "")
TrueFalse3 <- paste(sample(size = 15, x = c("T","F"), replace = TRUE), collapse = "")
my_df <- data.frame(c(BaseP1,BaseP2,BaseP3), c(TrueFalse1, TrueFalse2, TrueFalse3))
Fragment = function(string) {
nStart = sample(1:nchar(string) -5, 1)
substr(string, nStart, nStart + 4)
}
Fragment(string = my_df[1,1])#works for the first row, first col.
但這不起作用:
apply(my_df, c(1,2), function(x) Fragment(string = my_df[1:nrow(my_df),1:ncol(my_df)]))
這不是你想要的嗎? '應用(my_df,c(1,2),片段)' – JAD
只需'sapply(my_df,Fragment)' – Sotos
@JarkoDubbeldam不,這會產生比5個元素短的繪製。 – user8173816