我使用下面的腳本分析大型數據集:使嵌套循環更有效?
M <- c_alignment
c_check <- function(x){
if (x == c_1) {
1
}else{
0
}
}
both_c_check <- function(x){
if (x[res_1] == c_1 && x[res_2] == c_1) {
1
}else{
0
}
}
variance_function <- function(x,y){
sqrt(x*(1-x))*sqrt(y*(1-y))
}
frames_total <- nrow(M)
cols <- ncol(M)
c_vector <- apply(M, 2, max)
freq_vector <- matrix(nrow = sum(c_vector))
co_freq_matrix <- matrix(nrow = sum(c_vector), ncol = sum(c_vector))
insertion <- 0
res_1_insertion <- 0
for (res_1 in 1:cols){
for (c_1 in 1:conf_vector[res_1]){
res_1_insertion <- res_1_insertion + 1
insertion <- insertion + 1
res_1_subset <- sapply(M[,res_1], c_check)
freq_vector[insertion] <- sum(res_1_subset)/frames_total
res_2_insertion <- 0
for (res_2 in 1:cols){
if (is.na(co_freq_matrix[res_1_insertion, res_2_insertion + 1])){
for (c_2 in 1:max(c_vector[res_2])){
res_2_insertion <- res_2_insertion + 1
both_res_subset <- apply(M, 1, both_c_check)
co_freq_matrix[res_1_insertion, res_2_insertion] <- sum(both_res_subset)/frames_total
co_freq_matrix[res_2_insertion, res_1_insertion] <- sum(both_res_subset)/frames_total
}
}
}
}
}
covariance_matrix <- (co_freq_matrix - crossprod(t(freq_vector)))
variance_matrix <- matrix(outer(freq_vector, freq_vector, variance_function), ncol = length(freq_vector))
correlation_coefficient_matrix <- covariance_matrix/variance_matrix
模型輸入會是這樣的:
1 2 1 4 3
1 3 4 2 1
2 3 3 3 1
1 1 2 1 2
2 3 4 4 2
什麼我計算是每個國家的二項式方差在M[,i]
中找到,每個州在M[,j]
中找到。每一行都是該試驗的狀態,我想看看列的狀態是如何變化的。澄清:我找到兩個多項式分佈的協方差,但我通過二項式比較來完成。
輸入是一個4200 x 510矩陣,每列的c值平均約爲15。我知道for
循環在R中非常緩慢,但我不確定在這裏如何使用apply
函數。如果有人有建議,在這裏正確使用apply
,我真的很感激。現在腳本需要幾個小時。謝謝!
請問您可以添加一個小的數據集,你試圖得到什麼? – aatrujillob 2012-02-16 22:12:58
@AndresT添加了更多信息。 – 2012-02-16 22:22:24
你有沒有試過在編譯器中打開'loop unrolling'優化器? – 2012-02-16 22:36:34