2014-10-07 57 views
0

循環我有一個大的數據幀像這樣(只顯示前三個欄)dataframes:合併是輸出從用於R中

數據框被稱爲chr22_hap12

2 1 3 
2 1 3 
2 1 3 
2 1 2 
2 2 1 
2 2 1 

我想獲得每個數字的比例(每一列的數目,二進制和三進制數),並將其存儲在一個數據幀中。

這是我到目前爲止有:

for (i in 1:3) { 

    length(chr22_hap12[,i]) -> total_snps 
    sum(chr22_hap12[,i]==1,na.rm=FALSE) -> counts_ancestry_1 
    sum(chr22_hap12[,i]==2,na.rm=FALSE) -> counts_ancestry_2 
    sum(chr22_hap12[,i]==3,na.rm=FALSE) -> counts_ancestry_3 

    (counts_ancestry_1*100)/total_snps -> ancestry_1_perc 
    (counts_ancestry_2*100)/total_snps -> ancestry_2_perc 
    (counts_ancestry_3*100)/total_snps -> ancestry_3_perc 

    haplo_df[i] = NULL 

    haplo_df[i] = c(ancestry_1_perc,ancestry_2_perc,ancestry_3_perc) 
    as.data.frame(haplo_df[i]) 
} 

我得到這些錯誤回報:目的:試圖設置haplo_df [我] = NULL

錯誤haplo_df [我] = NULL後'haplo_df' 未找到

和後

haplo_df [I] = C(ancestry_1 _perc,ancestry_2_perc,ancestry_3_perc)

錯誤haplo_df [I] = C(ancestry_1_perc,ancestry_2_perc, ancestry_3_perc): 'haplo_df' 未找到

與as.data.frame

並再次對象( haplo_df [1])

對象haplo_df'未找到

我的願望輸出應該麗柯本:

0.00 66.66 50.0 
100.00 33.33 33.33 
0.00 0.00 16.66 
+0

均田的一個簡單的錯誤 - 在這個循環中,'haplo_df'被設置爲NULL。所以唯一一次它的結果不會被刪除的是最後一個循環(當'i = 3'時) – 2014-10-07 15:23:18

回答

1

您需要定義循環之前產生的matrix然後cbind新的結果給matrix

# define the data.frame before the loop. 
haplo_df <- NULL 
for (i in 1:3) { 
    length(chr22_hap12[,i]) -> total_snps 
    sum(chr22_hap12[,i]==1,na.rm=FALSE) -> counts_ancestry_1 
    sum(chr22_hap12[,i]==2,na.rm=FALSE) -> counts_ancestry_2 
    sum(chr22_hap12[,i]==3,na.rm=FALSE) -> counts_ancestry_3 

    (counts_ancestry_1*100)/total_snps -> ancestry_1_perc 
    (counts_ancestry_2*100)/total_snps -> ancestry_2_perc 
    (counts_ancestry_3*100)/total_snps -> ancestry_3_perc 

    # bind the new result to the existing data 
    haplo_df <- cbind(haplo_df , c(ancestry_1_perc,ancestry_2_perc,ancestry_3_perc)) 
} 
# return the result 
haplo_df 
##  [,1]  [,2]  [,3] 
## [1,] 0 66.66667 33.33333 
## [2,] 100 33.33333 16.66667 
## [3,] 0 0.00000 50.00000 

相反,你也可以只使用applytable,例如

apply(chr22_hap12, 2, function(x) 100*table(factor(x, levels=1:3))/length(x)) 
##  V1  V2  V3 
## 1 0 66.66667 33.33333 
## 2 100 33.33333 16.66667 
## 3 0 0.00000 50.00000 
+0

謝謝你的工作! – Javier2013 2014-10-07 16:42:09

0

這是另一種方法。

的樣本數據:

set.seed(23) 
y <- 1:3 
df <- data.frame(a = sample(y, 10, replace = TRUE), 
       b = sample(y, 10, replace = TRUE), 
       c = sample(y, 10, replace = TRUE)) 
#df 
# a b c 
#1 2 3 2 
#2 1 3 1 
#3 1 2 1 
#4 3 1 3 
#5 3 3 2 
#6 2 1 3 
#7 3 2 3 
#8 3 2 3 
#9 3 3 1 
#10 3 2 3 

計算百分比:

newdf <- as.data.frame(t(do.call(rbind, lapply(df, function(z){ 
    sapply(y, function(x) (sum(z == x)/length(z))*100) 
})))) 

#newdf 
# a b c 
#1 0.2 0.2 0.3 
#2 0.2 0.4 0.2 
#3 0.6 0.4 0.5 
0

嘗試:

mydf 
    V1 V2 V3 
1 2 1 3 
2 2 1 3 
3 2 1 3 
4 2 1 2 
5 2 2 1 
6 2 2 1 


ll = list() 
for(cc in 1:3) { 
    dd = mydf[,cc] 
    n1 = 100*length(dd[dd==1])/nrow(mydf) 
    n2 = 100*length(dd[dd==2])/nrow(mydf) 
    n3 = 100*length(dd[dd==3])/nrow(mydf) 
    ll[[length(ll)+1]] = c(n1, n2, n3) 
} 
ll 
[[1]] 
[1] 0 100 0 

[[2]] 
[1] 66.66667 33.33333 0.00000 

[[3]] 
[1] 33.33333 16.66667 50.00000 

> t(do.call(rbind, ll)) 
    [,1]  [,2]  [,3] 
[1,] 0 66.66667 33.33333 
[2,] 100 33.33333 16.66667 
[3,] 0 0.00000 50.00000 
1

我一個襯墊

sapply(df, function(x){prop.table(table(x))*100})