2015-09-17 68 views
2

我寫了一個函數,並行運行3個嵌套的foreach循環。該功能的目標是將30 [10,5]矩陣的列表(即[[30]][10,5])分成5個[10,30]矩陣的列表(即[[5]][10,30])。在R中使嵌套的foreach循環更有效率?

但是,我試圖用1,000,000個路徑(即foreach (m = 1:1000000))運行此功能,顯然,性能很糟糕。

我想盡可能避免應用功能,因爲我發現,與並行的foreach一起使用時,他們沒有很好地工作循環:

library(foreach) 
library(doParallel) 

# input matr: a list of 30 [10,5] matrices 
matrix_splitter <- function(matr) { 
    time_horizon <- 30 
    paths <- 10 
    asset <- 5 

    security_paths <- foreach(i = 1:asset, .combine = rbind, .packages = "doParallel", .export = "daily") %dopar% { 
    foreach(m = 1:paths, .combine = rbind, .packages = "doParallel", .export = "daily") %dopar% { 
     foreach(p = daily, .combine = c) %dopar% { 
     p[m,i] 
     } 
    } 
    } 
    df_securities <- as.data.frame(security_paths) 
    split(df_securities, sample(rep(1:paths), asset)) 
} 

總體來說,我試圖轉換這種數據格式:

[[30]] 
      [,1]  [,2]  [,3]  [,4]  [,5] 
[1,] 0.2800977 2.06715521 0.9196326 0.3560659 1.36126507 
[2,] -0.5119867 0.24329025 0.1513218 -1.2528092 -0.04795098 
[3,] -2.0293933 -1.17989270 0.3053376 -0.9528611 0.86758140 
[4,] -0.6419024 -0.24846720 -0.6640066 -1.7104961 -0.32759406 
[5,] -0.4340359 -0.44034013 3.3440507 0.7380613 2.
[6,] -0.6679914 -0.01332117 1.9286056 -0.7194116 0.15549978 
[7,] 0.5919820 0.11616685 -0.8424634 -0.7652715 1.34176688 
[8,] 0.8079152 0.40592119 -0.4291811 0.9358829 -0.97479314 
[9,] -0.0265207 -0.03598320 1.1287344 0.4732984 1.37792596 
[10,] 1.0553966 0.65776721 -1.2833613 -0.2414846 0.81528686 

爲此格式(顯然達V30):

$`5` 
V1   V2   V3   V4   V5   V6   V7 
result.2 -0.11822260 1.7712833 1.97737285 -1.6643193 0.4788075 1.2394064 1.4800787 
result.7 -1.23251178 0.4267885 -0.07728632 0.3463092 0.8766395 0.6324840 0.5946710 
result.2.1 -1.27309457 -0.3128173 -0.79561297 -0.4713307 -0.4344864 0.4688124 -0.5646857 
result.7.1 0.51702719 -1.6242650 -2.37976199 -0.1088408 0.4846507 -0.7594376 0.9326529 
result.2.2 1.77550390 0.9279155 0.26168402 0.4893835 1.4131326 0.5989508 -0.3434010 
result.7.2 -0.01590682 -0.5568578 1.35789122 -0.1385092 -0.4501515 -0.2581724 0.5451699 
result.2.3 0.30400225 -1.0245640 -0.05285694 -0.1354228 0.3070331 -0.7618850 1.0330961 
result.7.3 -0.08139912 0.4106541 1.40418839 0.2471505 1.2106539 1.3844721 0.4006751 
result.2.4 0.94977544 -0.8045054 1.48791211 1.4361686 -0.3789274 -1.9570125 -1.6576634 
result.7.4 0.70449194 1.6887800 0.56447340 0.6465640 2.6865388 -0.7367524 0.6242624 
        V8   V9   V10   V11  V12   V13 
result.2 -0.432404728 -1.6225350 0.09855465 0.17371907 0.3081843 0.15148452 
result.7 -0.597420706 0.6173004 0.07518596 2.01741406 0.1767152 -0.39219471 
result.2.1 0.918408322 -1.6896424 -0.13409626 0.38674224 0.3491750 -1.61083286 
result.7.1 2.564057340 -0.7696399 1.06103614 1.38528367 1.1684045 -0.08467871 
result.2.2 0.951995816 0.1910284 1.79943500 2.13909498 0.2847664 0.31094568 
result.7.2 -0.479349220 -0.2368760 0.04298525 -0.40385960 0.3986555 -1.93499213 
result.2.3 -1.382370069 1.0459845 -0.33106323 -0.43362925 0.7045572 -0.30211601 
result.7.3 -1.457106442 0.1487447 -2.52392942 -0.02399523 -1.0349746 0.87666365 
result.2.4 -0.848879365 0.7521024 0.16790915 0.47112444 0.8886361 -0.12733039 
result.7.4 -0.003350467 0.4021858 -1.80031445 -1.42399232 1.0507765 -0.36193846 
+1

你想如何重新排列?在你的例子中輸出中沒有輸出數字。 –

+0

它真的只是從[[30]] [10,5]到[[5]] [10,30]' –

+1

我根本沒有發現任何非常清楚的解釋,但我懷疑你可能會找到包(和函數)** abind **有幫助,然後是函數'aperm'。 – joran

回答

1

plyr是專爲此問題感謝alply。這個想法是:取消列表,以適當的方式將其列入數組,然後使用alply將此數組轉換爲矩陣列表。

2矩陣3x5列表的改造實例來5矩陣2x3的列表:

library(plyr) 

lst = list(matrix(1:15, ncol=5), matrix(10:24, ncol=5)) 

alply(array(unlist(lst), c(2,3,5)),3) 

#$`1` 
#  [,1] [,2] [,3] 
#[1,] 1 3 5 
#[2,] 2 4 6 

#$`2` 
#  [,1] [,2] [,3] 
#[1,] 7 9 11 
#[2,] 8 10 12 

#$`3` 
#  [,1] [,2] [,3] 
#[1,] 13 15 11 
#[2,] 14 10 12 

#$`4` 
#  [,1] [,2] [,3] 
#[1,] 13 15 17 
#[2,] 14 16 18 

#$`5` 
#  [,1] [,2] [,3] 
#[1,] 19 21 23 
#[2,] 20 22 24 
+0

謝謝。這對於小規模的事情來說很好。但是,當我達到5,000,000條路徑時,未列表創建的矢量太大(5.6 GB)。有沒有辦法做到這一點,而不'unlist()'? –

0

我將所有的列表轉換成一個偉大的大載體,然後再維它。

對於我的解決辦法,我開始:

[[28]] 
     [,1] [,2] [,3] [,4] [,5] 
    [1,] 1 11 21 31 41 
    [2,] 2 12 22 32 42 
    [3,] 3 13 23 33 43 
    [4,] 4 14 24 34 44 
    [5,] 5 15 25 35 45 
    [6,] 6 16 26 36 46 
    [7,] 7 17 27 37 47 
    [8,] 8 18 28 38 48 
    [9,] 9 19 29 39 49 
[10,] 10 20 30 40 50 

重複30次。這是變量orig。我的代碼:

flattened.vec <- unlist(orig) #flatten the list of matrices into one big vector 
dim(flattened.vec) <-c(10,150) #need to rearrange the vector so the re-shape comes out right 
transposed.matrix <- t(flattened.vec) #transposing to make sure right elements go to the right place 
new.matrix.list <- split(transposed.matrix,cut(seq_along(transposed.matrix)%%5, 10, labels = FALSE)) #split the big, transposed matrix into 5 10x30 matrices 

此代碼爲您提供了5個載體,你需要dim(10,30),然後用它們t()在foreach得到5個30×10向量(我通常會使用一個apply功能,我不熟悉庫)。這樣做之後

爲5點矩陣的一個最終結果導致:

 [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] [,15] [,16] [,17] 
[1,] 1 1 1 1 1 1 1 1 1  1  1  1  1  1  1  1  1 
[2,] 2 2 2 2 2 2 2 2 2  2  2  2  2  2  2  2  2 
[3,] 3 3 3 3 3 3 3 3 3  3  3  3  3  3  3  3  3 
[4,] 4 4 4 4 4 4 4 4 4  4  4  4  4  4  4  4  4 
[5,] 5 5 5 5 5 5 5 5 5  5  5  5  5  5  5  5  5 
[6,] 6 6 6 6 6 6 6 6 6  6  6  6  6  6  6  6  6 
[7,] 7 7 7 7 7 7 7 7 7  7  7  7  7  7  7  7  7 
[8,] 8 8 8 8 8 8 8 8 8  8  8  8  8  8  8  8  8 
[9,] 9 9 9 9 9 9 9 9 9  9  9  9  9  9  9  9  9 
[10,] 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 

     [,18] [,19] [,20] [,21] [,22] [,23] [,24] [,25] [,26] [,27] [,28] [,29] [,30] 
[1,]  1  1  1  1  1  1  1  1  1  1  1  1  1 
[2,]  2  2  2  2  2  2  2  2  2  2  2  2  2 
[3,]  3  3  3  3  3  3  3  3  3  3  3  3  3 
[4,]  4  4  4  4  4  4  4  4  4  4  4  4  4 
[5,]  5  5  5  5  5  5  5  5  5  5  5  5  5 
[6,]  6  6  6  6  6  6  6  6  6  6  6  6  6 
[7,]  7  7  7  7  7  7  7  7  7  7  7  7  7 
[8,]  8  8  8  8  8  8  8  8  8  8  8  8  8 
[9,]  9  9  9  9  9  9  9  9  9  9  9  9  9 
[10,] 10 10 10 10 10 10 10 10 10 10 10 10 10 

順便說一句,這可能是plyr包做什麼對自己已經(如張貼Beauvel上校),只是手動代替使用外部庫

+0

謝謝。這對於小規模的事情來說很好。但是,當我達到5,000,000條路徑時,未列表創建的矢量太大(5.6 GB)。有沒有辦法做到這一點,而不'unlist()'? –

+0

數據是否必須作爲列表進入?從我的測試看來,'unlist()'似乎很慢(設置'use.names = FALSE'有一點幫助,但不是太多)。但是,如果您可以從三維向量開始,那麼效率會降低(您的數據集所需的存儲空間將縮小約10%) – Frank