2015-04-20 26 views
0

我正在運行一個函數,以從100,000+個患者xray dicom文件中提取一些信息。出於安全目的,文件存儲在veracrypt加密容器內。爲什麼Matlab在大量迭代中緩慢進行循環,但快速迭代次數很少?

當我運行一小部分文件時,它的執行速度非常快,但是當我在整個數據集上運行函數時,它的速度相當慢,從每秒幾個文件到每秒1個文件(大約)。

,我很困惑,爲什麼發生這種情況?我曾嘗試將數據存儲在ssd和普通硬盤上,並且在使用較大數據集時與使用較小數據集相比,速度較慢。

我已經添加下面的參考代碼,但還沒有完全註釋它尚未..這是我的論文,所以我將做到這一點,一旦我得到完成提取..

感謝您的幫助。

function [ DB, corrupted_files ] = extract_from_dcm(folder_name) 
%EXTRACT_FROM_DCM Summary of this function goes here 
% Detailed explanation goes here 

if nargin == 0 
    folder_name = 'I:\Find and Treat\MXU Old Backup\2005'; 
end 

Database_Check = strcat(folder_name, '\DataBase.mat'); 

if exist(Database_Check, 'file') 

    load(Database_Check); 
    entry_start = length(DB) + 1; 

else 

    entry_start = 1; 
    [ found_dicoms ] = recursive_search(folder_name); 

end 

mat_file_location = strcat(folder_name, '\DataBase.mat'); 
excel_DB_file = strcat(folder_name, '\DataBase.xlsx'); 
excel_Corrupted_file = strcat(folder_name, '\Corrupted_Files.xlsx'); 

% the recursive search creates a struct with the path for each 
% dcm file found. the list is then recursivly used to locate 
% the image and extract the relevant information from it. 


fprintf('---------------------------------------------\n'); 
fprintf('Start Patient Data Extraction\n'); 
tic 
h = waitbar(0,'','Name','Patient Data Extraction'); 
entry_end = length(found_dicoms); 

if entry_end == 0 
    %  set(handles.info_box, 'String', 'No Dicom Files Found in this Folder or its Subfolders'); 
else 
    %  set(handles.info_box, 'String', 'Congratulations Dicom Files have been found Look Through the Data Base using the Buttons Below....Press Save Button to save the Database. (Database Save format is EXCEL SpreadSheet and MAT file'); 

    for kk = entry_start : entry_end 
     progress = kk/entry_end; 
     progress_percent = round(progress * 100); 
     waitbar(progress,h, sprintf('%d%% %d/%d of images processed', progress_percent, kk, entry_end)); 

     img_full_path = found_dicoms(kk).name; 

     %   search_path = folder_name; 
     %   img_full_path = strrep(img_full_path, search_path, ''); 

     try    %# Attempt to perform some computation 
      dicom_info = dicominfo(img_full_path); %# The operation you are trying to perform goes here 

      try    %# Attempt to perform some computation 
       dicom_read = dicomread(dicom_info); %# The operation you are trying to perform goes here 

       old = dicominfo(img_full_path); 

       DB(kk).StudyDate = old.StudyDate; 
       DB(kk).StudyTime = old.StudyTime; 


       if isfield(old.PatientName, 'FamilyName') 
        DB(kk).Forename = old.PatientName.FamilyName; 
       else 
        DB(kk).Forename = 'NA'; 
       end 

       if isfield(old.PatientName, 'GivenName') 
        DB(kk).LastName = old.PatientName.GivenName; 
       else 
        DB(kk).LastName = 'NA'; 
       end 

       if isfield(old, 'PatientSex') 
        DB(kk).PatientSex = old.PatientSex; 
       else 
        DB(kk).PatientSex = 'NA'; 
       end 

       if isempty(old.PatientBirthDate) 
        DB(kk).PatientBirthDate = '00000000'; 
       else 
        DB(kk).PatientBirthDate = old.PatientBirthDate; 
       end 

       if strcmp(old.Manufacturer, 'Philips Medical Systems') 
        DB(kk).Van = '1'; 
       else 
        DB(kk).Van = '0';% section to represent organising by different vans 
       end 

       DB(kk).img_Path = img_full_path; 

       save(mat_file_location,'DB','found_dicoms'); 

      catch exception %# Catch the exception 
       fprintf('read - file %d corrupt.\n',kk); 
       continue  %# Pass control to the next loop iteration 
      end 



     catch exception %# Catch the exception 
      fprintf('info - file %d corrupt.\n',kk); 
      continue  %# Pass control to the next loop iteration 
     end 


    end 

end 


[ corrupted_files, DB ] = corruption_check(DB, found_dicoms, folder_name); 


toc 
fprintf('End Patient Data Extraction\n'); 
fprintf('---------------------------------------------\n'); 

fprintf('---------------------------------------------\n'); 
fprintf('Start Saving Extracted Data \n'); 
tic 


save(mat_file_location,'DB','corrupted_files','found_dicoms'); 

if isempty(DB) 
    msg = sprintf('No Dicom Files Found'); 
    msgbox(strcat(msg)); 
else 
    DB_table = struct2table(DB); 
    writetable(DB_table, excel_DB_file); 
end 

close(h); 

toc 
fprintf('End Saving Extracted Data \n'); 
fprintf('---------------------------------------------\n'); 

end 
+4

就扔了這一點,有沒有任何證據支持:你真的應該預先分配的是'DB'陣列,它看起來並不像你...你知道有多少元素會出現,特別是'entry_end - entry_start + 1'。當你做'DB(kk)'的時候,Matlab應該警告你,因爲你會不斷改變它的大小(這很慢),因爲這會導致重新分配。要確定,運行[profiler](http://www.mathworks.com/help/matlab/ref/profile.html)。 –

+2

更新該等待欄可能會耗費大量執行時間。循環中的圖形更新非常昂貴。嘗試評論,看看是否有幫助。更好的是,使用探查器。 – siliconwafer

+0

很酷的感謝,你是我打算做預先分配,我會看看waitbar。問題是,作爲代碼,較大的循環就變得越慢,我不明白爲什麼這會是這樣.. – bilaly

回答

0

OK感謝所有幫助..

我的問題是在每次迭代結束的儲蓄,但最大的問題是在哪裏運行dicomread功能就行了。每改變一次20張圖像,我都會改變保存方式。

我也刪除的意見建議,看看它沒有dicromread和保存膨脹做出什麼區別預分配。它比預分配要慢得多。

...我只需要找到dicomread的解決方案(我用的方法來檢查,如果該文件被損壞或沒有)。

+1

我想說,但它說我必須等待2天才能接受它。 – bilaly