2012-02-03 23 views
0

我有大集將一些變數:適用於多列了一個說法

set.seed (14) 
pool = sample (c("AA","AB", "BB"), 100, replace = T) 
mydf <- data.frame (M1= pool[1:10], M2= pool[11:20], 
M3= pool[21:30], M4= pool[31:40], M5= pool[41:50], 
    M6= pool[51:60], M7= pool[61:70], M8 = pool[71:80], 
    M9 = pool[81:90], M10 = pool[91:100]) 

需要安裝包「hapassoc」,如果以前安裝的。

install.packages( 「hapassoc」)

> library(hapassoc) 
> example1.haplos <- pre.hapassoc(mydf, numSNPs = 3, allelic= F) 

Haplotypes will be based on the following SNPs (genotypic format): 
M8, M9, M10 
Remaining variables are: 
M1, M2, M3, M4, M5, M6, M7 

它正在最後3個變量在基團。但1要通過組打破數據成小塊,應用此功能 -

M1, M2, M3 group 1 
M4, M5  group 2 
M6, M7, M8 group 3 
M9, M10  group 4 

因此numSNPs將通過以下向量表示:

nsp <- c(3, 2, 3, 2) 

我要保留$ haploMat每個組

example1.haplos$haploMat 
haplo1 haplo2 
1 hBBA hBAB 
3 hAAB hABB 
4 hABA hABA 
6 hAAA hBBA 
7 hAAA hAAA 
8 hBBA hBBB 
9 hABB hBBB 
10 hABA hBAB 
12 hAAA hBBB 
13 hAAB hBBA 
14 hABA hABA 
15 hAAB hBAB 

最終輸出有八列group1.haplo1,goup1.haplo2,group2.haplo1,group2.haplo2,group3.haplo1,group4.haplo1,group4.haplo2。

我該如何做到這一點?

回答

1

這是你所追求的? (指定組的列號作爲分配給grps的列表的元素)。您需要安裝reshape2軟件包。您可以使用plyr軟件包中的rbind.fill()做類似的操作。

set.seed (14) 
pool = sample (c("AA","AB", "BB"), 100, replace = T) 
mydf <- data.frame (M1= pool[1:10], M2= pool[11:20], 
M3= pool[21:30], M4= pool[31:40], M5= pool[41:50], 
    M6= pool[51:60], M7= pool[61:70], M8 = pool[71:80], 
    M9 = pool[81:90], M10 = pool[91:100]) 

library(hapassoc) 

grps <- list(1:3, 4:5, 6:8, 9:10) 
haplos <- lapply(grps, function(x) { 
    out <- pre.hapassoc(mydf[, x], numSNPs=length(x), allelic=F, 
     verbose=F)$haploMat 
    row.names(out) <- as.numeric(row.names(out)) 
    out 
}) 
haplos <- lapply(haplos, t) 
library(reshape2) 
haplos <- melt(haplos,value.name='haplotype') 
haplos <- dcast(haplos, Var2 ~ L1 + Var1, value.var='haplotype') 

結果

haplos 

    Var2 1_haplo1 1_haplo2 2_haplo1 2_haplo2 3_haplo1 3_haplo2 4_haplo1 4_haplo2 
1  1  hABA  hABB  hBA  hBA  hAAA  hAAB  hAA  hAA 
2  2  <NA>  <NA>  hAB  hAB  hAAB  hABB  hAA  hAA 
3  3  hBAA  hAAB  hBA  hBB  hBBB  hBAA  hAA  hBA 
4  4  hBBB  hBAA  hBA  hAB  <NA>  <NA>  hAB  hBB 
5  5  <NA>  <NA>  hBB  hAA  hABB  hAAA  hAB  hBB 
6  6  hABB  hBBB  hBA  hBB  hABA  hAAB  hBB  hBB 
7  7  hBBB  hBBB  hAA  hAA  hBBB  hBAA  hAB  hBB 
8  8  hBBB  hABA  hBA  hAB  <NA>  <NA>  hAA  hAA 
9  9  <NA>  <NA>  hBB  hAA  hAAB  hAAB  hAA  hAB 
10 10  hBBB  hBAA  hAA  hBA  hABB  hBBB  hAB  hAB 
11 11  <NA>  <NA>  hBB  hBB  hBBA  hBBB  <NA>  <NA> 
12 12  hBBB  hABA  hAB  hBB  hABA  hABB  <NA>  <NA> 
13 13  <NA>  <NA>  <NA>  <NA>  hABB  hBAA  <NA>  <NA> 
14 14  hABB  hBBB  <NA>  <NA>  <NA>  <NA>  <NA>  <NA> 
15 15  <NA>  <NA>  <NA>  <NA>  hAAB  hBBA  <NA>  <NA> 
16 16  hBAA  hABA  <NA>  <NA>  hAAA  hBBB  <NA>  <NA> 
+0

非常感謝您的回答,我願意接受這個答案可是我沒有得到什麼,我需要在最後一行:haplos < - dcast(haplos,VAR2 〜L1 + Var1,value.var ='haplotype'),我也嘗試過value_var =「haplotype」 - 但確實發生了錯誤 – jon 2012-02-07 15:32:00

+0

@John我編輯過包含完整的代碼,適合我。這是與hapassoc_1.2.4和reshape2_1.2.1。如果您仍然收到錯誤,您可以將它添加爲評論嗎? – jbaums 2012-02-07 22:15:52

+0

謝謝,我在舊的R版本中使用了reshape2_1.2.1和hapassoc_1.2-4,但使用較新版本的R是作品,謝謝...原因未知 – jon 2012-02-09 01:31:02

相關問題