我需要處理大量的混合類型的表格數據 - 字符串和雙打。一個標準問題,我會想。 Matlab中處理這個問題的最好的數據結構是什麼?混合類型的Matlab數據結構 - 什麼是時間+空間高效?
Cellarray絕對不是答案。這是非常低效的內存。 (測試如下所示)。數據集(來自統計工具箱)在時間和空間上都非常低效。這給我留下了結構陣列或數組結構。我對下面的時間和內存的所有四個不同選項做了測試,在我看來,數組結構對於我測試的東西來說是最好的選擇。
我對Matlab比較新,坦白說這有點令人失望。無論如何 - 尋找我是否缺少東西的建議,或者我的測試是否準確/合理。除了訪問/轉換/內存使用情況之外,我還會錯過其他一些考慮事項,這些使用情況很可能會在我使用這些東西進行編碼時出現。 (使用R2010b)
* *測試#1:訪問速度訪問數據項。
cellarray:0.002s
dataset:36.665s %<<< This is horrible
structarray:0.001s
struct of array:0.000s
* *測試#2:轉換速度和內存使用情況,我從這個測試下降數據集。
Cellarray(doubles)->matrix:d->m: 0.865s
Cellarray(mixed)->structarray:c->sc: 0.268s
Cellarray(doubles)->structarray:d->sd: 0.430s
Cellarray(mixed)->struct of arrays:c->sac: 0.361s
Cellarray(doubles)->struct of arrays:d->sad: 0.887s
Name Size Bytes Class Attributes
c 100000x10 68000000 cell
d 100000x10 68000000 cell
m 100000x10 8000000 double
sac 1x1 38001240 struct
sad 1x1 8001240 struct
sc 100000x1 68000640 struct
sd 100000x1 68000640 struct
================== CODE:TEST#1
%% cellarray
c = cell(100000,10);
c(:,[1,3,5,7,9]) = num2cell(zeros(100000,5));
c(:,[2,4,6,8,10]) = repmat({'asdf'}, 100000, 5);
cols = strcat('Var', strtrim(cellstr(num2str((1:10)'))))';
te = tic;
for iii=1:1000
x = c(1234,5);
end
te = toc(te);
fprintf('cellarray:%0.3fs\n', te);
%% dataset
ds = dataset({ c, cols{:} });
te = tic;
for iii=1:1000
x = ds(1234,5);
end
te = toc(te);
fprintf('dataset:%0.3fs\n', te);
%% structarray
s = cell2struct(c, cols, 2);
te = tic;
for iii=1:1000
x = s(1234).Var5;
end
te = toc(te);
fprintf('structarray:%0.3fs\n', te);
%% struct of arrays
for iii=1:numel(cols)
if iii/2==floor(iii/2) % even => string
sac.(cols{iii}) = c(:,iii);
else
sac.(cols{iii}) = cell2mat(c(:,iii));
end
end
te = tic;
for iii=1:1000
x = sac.Var5(1234);
end
te = toc(te);
fprintf('struct of array:%0.3fs\n', te);
============= ===== CODE:TEST#2
%% cellarray
% c - cellarray containing mixed type
c = cell(100000,10);
c(:,[1,3,5,7,9]) = num2cell(zeros(100000,5));
c(:,[2,4,6,8,10]) = repmat({'asdf'}, 100000, 5);
cols = strcat('Var', strtrim(cellstr(num2str((1:10)'))))';
% c - cellarray containing doubles only
d = num2cell(zeros(100000, 10));
%% matrix
% doubles only
te = tic;
m = cell2mat(d);
te = toc(te);
fprintf('Cellarray(doubles)->matrix:d->m: %0.3fs\n', te);
%% structarray
% mixed
te = tic;
sc = cell2struct(c, cols, 2);
te = toc(te);
fprintf('Cellarray(mixed)->structarray:c->sc: %0.3fs\n', te);
% doubles
te = tic;
sd = cell2struct(d, cols, 2);
te = toc(te);
fprintf('Cellarray(doubles)->structarray:d->sd: %0.3fs\n', te);
%% struct of arrays
% mixed
te = tic;
for iii=1:numel(cols)
if iii/2==floor(iii/2) % even => string
sac.(cols{iii}) = c(:,iii);
else
sac.(cols{iii}) = cell2mat(c(:,iii));
end
end
te = toc(te);
fprintf('Cellarray(mixed)->struct of arrays:c->sac: %0.3fs\n', te);
% doubles
te = tic;
for iii=1:numel(cols)
sad.(cols{iii}) = cell2mat(d(:,iii));
end
te = toc(te);
fprintf('Cellarray(doubles)->struct of arrays:d->sad: %0.3fs\n', te);
%%
clear iii cols te;
whos
雖然'數據集'確實很慢,但您的計時非常緩慢。我在訪問時獲取「數據集:0.7s」,而其他人的順序與您的相同。我在32位的WinXP – Amro 2013-04-26 19:57:37