我有超過200列的data.frame
,幷包括以下包括有關這個問題列一個子集:獨特的組合,基於標準從一行
>df
Variant Pos ID DB.0.count DB.1.count sample1 sample2 sample3 sample4 sample5 sample6 sample7 sample8 sample9 sample10
variant5 1234567 A 5 5 1/0 1/0 1/0 1/1 1/1 0/0 1/0 0/0 1/0 1/1
. . . . . F1 F1 F1 F2 F2 F3 F4 F4 F4 F5
我想:
1.使samples1-sample10列,其中每個組合包含來自每個F數一個樣品,即,每個組合包含5個樣品從F1,F2,F3,F4,F5每一個樣品的所有可能的組合。
所以在上面的實例中會有18點的組合,例如:
第一組合將是SAMPLE1,sample4,sample6,sample7,sample10
第二組合是SAMPLE1,sample4,sample6 ,樣品8,sample10
第三組合是SAMPLE1,sample4,sample6,sample9,sample10
我與unique
,duplicated
和0123發揮各地閱讀相關帖子後,卻沒有任何地方。
然後,我想輸出每個獨特的組合到一個新的data.frame
,對樣本中的樣本中的每個變量執行計數,並將結果輸出到新列,然後執行下面的Fisher精確測試並輸出到新列,下面,將下面的代碼應努力做到:(費代碼在這裏瞭解到:Fisher's exact test on values from large dataframe and bypassing errors)
df.combo.1$pop.0/0.count <- apply(df.combo.1[,6:10], 1, function(u) length(which(grepl("0/0",u))==TRUE))
df.combo.1$pop.1/0.count <- apply(df.combo.1[,6:10], 1, function(u) length(which(grepl("1/0",u))==TRUE))
df.combo.1$pop.1/1.count <- apply(df.combo.1[,6:10], 1, function(u) length(which(grepl("1/1",u))==TRUE))
df.combo.1$pop.0.count <- (2*(apply(df.combo.1[,6:10], 1, function(u) length(which(grepl("0/0",u))==TRUE))) + apply(df.combo.1[,6:10], 1, function(u) length(which(grepl("0/1",u))==TRUE)))
df.combo.1$pop.1.count <- (2*(apply(df.combo.1[,6:10], 1, function(u) length(which(grepl("1/1",u))==TRUE))) + apply(df.combo.1[,6:10], 1, function(u) length(which(grepl("0/1",u))==TRUE)))
res <- NULL
for (i in 1:nrow(df.combo.1)){
table <- matrix(c(df.combo.1[i, 4], df.combo.1[i, 5], df.combo.1[i, 14], df.combo.1[i, 15]), ncol = 2, byrow = TRUE)
# if any NA occurs in your table save an error in p else run the fisher test
if(any(is.na(table))) p <- "error" else p <- fisher.test(table)$p.value
# save all p values in a vector
res <- c(res,p)
}
df.combo.1$fishers <- res
>df.combo.1
Variant Pos ID DB.0.count DB.1.count sample1 sample4 sample6 sample7 sample10 pop.0/0.count pop.1/0.count pop.1/1.count pop.0.count pop.1.count fishers
variant5 1234567 A 5 5 1/0 1/1 0/0 1/0 1/1 1 2 2 4 6 1.0000
. . . . . F1 F2 F3 F4 F5
2.最後,我想創建一個data.frame
,其中列出了每一個獨特的組合Fisher精確p值如下:
>new.df
combo fishers
1 1.0000
2 1.0000
3 1.0000
4 1.0000
etc
我認爲這整個練習可能需要某種for循環?
完美,太感謝你了! – emily