MATLAB中的ismember（）函數的快速版本

我的問題是關於尋找一種替代方法來以更快的方式在MATLAB中執行什麼ismember()。MATLAB中的ismember（）函數的快速版本

這裏是我的問題：

M [92786253*1] (a: roughly 100M rows) 
x [749*1]  (b: # of rows can vary from 100 to 10K)

我想找到b行是共同存在於a（的行索引）。 對於不同版本的b，此操作需要重複約10M次。

的一般做法：

tic 
ind1 = ismember(M,x); 
toc 

Elapsed time is 0.515627 seconds.

快速方法：

tic 
n = 1; 
ind2 = find(any(all(bsxfun(@eq,reshape(x.',1,n,[]),M),2),3)); 
toc 

Error using bsxfun 
Requested 92786253x1x749 (64.7GB) array exceeds maximum array size preference. 
Creation of arrays greater than this limit may take a long time and cause MATLAB to become unresponsive. 
See array size limit or preference panel for more information. 

Error in demo_ismember_fast (line 23) 
ind2 = find(any(all(bsxfun(@eq,reshape(x.',1,n,[]),M),2),3))

第二種方法通常比正常的速度更快的15-20倍，但是在這種情況下，我不能用它來限制內存。有沒有任何建議如何加快這一行動？感謝您與我分享專家意見！

來源

2017-09-25 YAS

我猜想購買64Gb的RAM？：P這是一個非常大的問題，你需要預計它很慢 –

如果是這樣，爲什麼在第一種情況下，我沒有收到任何錯誤？我也認爲除了使用'ismember（）'外，還有其他一些技巧。 – YAS

那裏有什麼限制？「M」還是「x」分類？ – Divakar

如果您可以使用排序的a這裏有兩種替代方法。在開始100M迭代之前，一些需要的輸入變量和輸出變量ind被初始化，並且在每次迭代ind被修改，並且最後所有元素被設置爲false;

interp1：

s=sort(M); 
edge = [-Inf s(2:end) Inf]; 
v = [1:numel(M) numel(M)]; 
ind = false(size(M)); 
%for ... 100M iterations 
    tic 
    bin = interp1(edge,v,x,'previous'); 
    ind(bin)= ind(bin)==x; 
    toc 
    %... 
    ind(bin) = false;%at the end of each loop set all elements of ind to 0; 
%end

histcounts：

來源

2017-09-25 13:10:32 rahnema1

感謝您提出這些建議，在我原來的問題中，我們有'M'和'x'，你可以根據這個修改答案嗎？也有你的納入，因爲我提到的100M迭代或它需要1次迭代？ – YAS

假設for循環有100M次迭代。我會改變變量。 – rahnema1

感謝您的澄清，你能否提一下'out'代表什麼？ – YAS

您可能會發現內部（內置）ismembc功能非常有用 - 它可以是一個數量級比ismember快：http://UndocumentedMatlab.com/blog/ismembc-undocumented-helper-function

不是e ismembc只適用於排序的非稀疏非Nan數字數據。

來源

2017-09-26 07:56:34

MATLAB中的ismember（）函數的快速版本

回答

相關問題