麻煩在Matlab預分配一個複雜的循環

我是相當新的matlab，但對於我的工作，我需要導入一個ENORMOUS數據集並以某種方式進行組織。我寫了一個代碼，可以做到這一點，但非常不方便（這只是我的第三大代碼，需要幾個小時）。 Matlab告訴我，我可以預先分配我的變量（事實上大約五十次），但我無法看到如何做到這一點，因爲我不確定在for循環中每次迭代將爲數據添加什麼矩陣。代碼本身可能比我更好地解釋了這一點。
（這只是一小塊，但有望顯示我的問題）麻煩在Matlab預分配一個複雜的循環

for x= 1:length(firstinSeq) 
      for y= 1:length(littledataPassed-1) 
       if firstinSeq(x,1)== littledataPassed(y,1) && firstinSeq(x,2)== littledataPassed(y,2) 
         switch firstinSeq(x,3) 
          case 0 
           for z= 0:1000 
            w= y+z; 
            if firstinSeq(x,4)== littledataPassed(w,4) 
             if littledataPassed(w,6)== 1 && firstinSeq(x,2)== littledataPassed(w,2) && littledataPassed(w,5)== 0 
              msgLength0= [msgLength0; firstinSeq(x,:) littledataPassed(w,:)]; 
              break 
             else continue 
             end 
            else msgLength0= [msgLength0; firstinSeq(x,:) [0 0 0 0 0 0]]; 
             break 
            end 
           end 
          case 1 
           for z= 0:1000 
            w= y+z; 
            if firstinSeq(x,4)== littledataPassed(w,4) %if sequence not the same, terminate 
             if littledataPassed(w,6)== 1 && firstinSeq(x,2)== littledataPassed(w,2) && littledataPassed(w,5)== 0 
              msgLength1= [msgLength1; firstinSeq(x,:) littledataPassed(w,:)]; 
              break 
             else continue 
             end 
            else msgLength1= [msgLength1; firstinSeq(x,:) [0 0 1 0 0 0]]; 
             break   
            end 
           end 
          case 2 
           for z= 0:1000 
            w= y+z; 
            if firstinSeq(x,4)== littledataPassed(w,4) 
             if littledataPassed(w,6)== 1 && firstinSeq(x,2)== littledataPassed(w,2) && littledataPassed(w,5)== 0 
              msgLength2= [msgLength2; firstinSeq(x,:) littledataPassed(w,:)]; 
              break 
             else continue 
             end 
            else msgLength2= [msgLength2; firstinSeq(x,:) [0 0 2 0 0 0]]; 
             break 
            end 
           end 
           for z= 0:1000 
            w= y+z; 
            if firstinSeq(x,4)== littledataPassed(w,4) 
             if littledataPassed(w,6)== 1 && firstinSeq(x,2)== littledataPassed(w,2) && littledataPassed(w,5)== 1 
              msgLength2= [msgLength2; firstinSeq(x,:) littledataPassed(w,:)]; 
              break 
             else continue 
             end 
            else msgLength2= [msgLength2; firstinSeq(x,:) [0 0 2 0 1 0]]; 
             break 
            end 
           end

對我怎麼能預先分配這些變量（msgLength0,1,2等）有什麼想法？他們沒有爲循環中的每個值添加數據，我不確定每次運行的最終大小。現在我的轉換總共有八個案例，這使得這個程序非常緩慢。

來源

2010-07-22 Maxwell

如果我正確地讀取了您的代碼，那麼其中一個變量msgLengthN會針對每次穿過最內層循環的行程進行擴展？如果是這樣，那會提示您可能想要預先分配一個名爲msgLengthAll的數組，並在您進行時填充該數組，以確保每個條目中存在區分0,1,2等的值。

如果你不知道前面有多少空間分配爲msgLengthAll，那麼你既可以：

掃描輸入文件一次，以確定有多大這一點，和其他陣列，需要的人。閱讀大文件不只一次來處理它們，這可能會爲您節省大量時間。或者
沉迷於一些奇特的分配方案，最初你可以猜測需要多少空間，然後，當它滿了時，分配更多的內存。有多種方式可以決定在每個擴展點分配多少：固定大小或可能與已經分配的數量（即每次擴展的分配數量的兩倍）。這當然可能相當複雜。

您是否正在逐行讀取文件並更新內存中的變量？或者你正在閱讀整個文件，然後在內存中分類？巨大的有多大？你有多少RAM？

來源

2010-07-22 15:52:23

感謝您的回覆。它是一個研究項目的整個數據集。大約30,000行的210個文本文件和大約80,000行的210個文本文件正在被讀取（它們成對出現）到Matlab中。我試圖設置我的腳本的方式是讀取一對文件，匹配相應的數據行，然後在我想要處理的行上運行這個荒謬的循環。〜200,000行數據由該循環處理（firstinSeq varaible）。所以要回答你的問題，導入，然後排序。我會發布我的整個腳本給你，但它是非常不友好的，需要600行。 – Maxwell 2010-07-22 16:38:57

@Maxwell：所以如果你拿一對匹配的文件，在一場比賽中做30,000行1：〜3，其他80,000行？那麼你是否試圖在Matlab中構建一個存儲來自兩者的聚合數據的數組？對於這種數據纏繞，在開始嘗試將數據讀入Matlab之前，通常儘可能使用諸如sed和awk之類的實用程序，以便在沒有其他原因的情況下獲得性能。 – 2010-07-22 17:59:40

好吧，所以自從第一篇文章發佈以來，我一直在玩這個代碼，並對它進行了大量修改。我已經最終想出瞭如何預先分配這個。代碼仍然很慢，因爲我真的不知道如何簡化/向量化這個循環，但有很大的改進。 – Maxwell 2010-07-22 18:58:38

您可以通過查找符合您的標準的1000元素塊內的記錄索引，然後將它們一次性附加到msgLength0，從而對每個開關盒中的處理進行矢量化處理。以下是case 0代碼的向量化版本：

indexStop = find(firstinSeq(x,4) != littledataPassed(y:y+1000,4), 1, 'first'); 
if isempty(indexStop) 
    indexStop = 1000; 
end 
indexProcess = find(littledataPassed(y:y+indexStop,6) == 1 & ... 
    littledataPassed(y:y+indexStop,2) == firstinSeq(x,2) & ... 
    littledataPassed(y:y+indexStop,5) == 0); 
msgLength0 = [msgLength0; firstinSeq(x,:) littledataPassed(y+indexProcess-1,:); [0 0 0 0 0 0]];

矢量化的外環會做了很多，以及減少執行時間。我不太瞭解您的數據以提出具體的方法，但可能使用重塑和/或repmat函數來創建您可以在vectorally上操作的數組可能是要走的路。

來源

2010-07-22 20:53:18

麻煩在Matlab預分配一個複雜的循環

回答

相關問題