2012-07-08 228 views
2

嗨我想知道在圖形屏幕上羣集數據時,是否有辦法在滾動時顯示數據點屬於哪些行?在羣集kmeans數據上顯示行

enter image description here

從上面的圖片中,我希望會有其中,如果我選擇的方式或捲動,我可以告訴它屬於哪一行的點。

下面是代碼:

%% dimensionality reduction 
columns = 6 
[U,S,V]=svds(fulldata,columns); 
%% randomly select dataset 
rows = 1000; 
columns = 6; 

%# pick random rows 
indX = randperm(size(fulldata,1)); 
indX = indX(1:rows); 

%# pick random columns 
indY = randperm(size(fulldata,2)); 
indY = indY(1:columns); 

%# filter data 
data = U(indX,indY); 
%% apply normalization method to every cell 
data = data./repmat(sqrt(sum(data.^2)),size(data,1),1); 

%% generate sample data 
K = 6; 
numObservarations = 1000; 
dimensions = 6; 

%% cluster 
opts = statset('MaxIter', 100, 'Display', 'iter'); 
[clustIDX, clusters, interClustSum, Dist] = kmeans(data, K, 'options',opts, ... 
'distance','sqEuclidean', 'EmptyAction','singleton', 'replicates',3); 

%% plot data+clusters 
figure, hold on 
scatter3(data(:,1),data(:,2),data(:,3), 5, clustIDX, 'filled') 
scatter3(clusters(:,1),clusters(:,2),clusters(:,3), 100, (1:K)', 'filled') 
hold off, xlabel('x'), ylabel('y'), zlabel('z') 

%% plot clusters quality 
figure 
[silh,h] = silhouette(data, clustIDX); 
avrgScore = mean(silh); 

%% Assign data to clusters 
% calculate distance (squared) of all instances to each cluster centroid 
D = zeros(numObservarations, K);  % init distances 
for k=1:K 
%d = sum((x-y).^2).^0.5 
D(:,k) = sum(((data - repmat(clusters(k,:),numObservarations,1)).^2), 2); 
end 

% find for all instances the cluster closet to it 
[minDists, clusterIndices] = min(D, [], 2); 

% compare it with what you expect it to be 
sum(clusterIndices == clustIDX) 

或者可能是簇的數據的輸出方法,歸一化和重新組織到那裏原始格式上與排它屬於從原來的端柱appedicies 「fulldata」。

+0

什麼是錯的在右上角的聚類中心?而這兩個深藍色的組合對我來說看起來不太明智。 – 2012-07-08 18:17:59

+1

對我來說,有3個不同的羣集,我沒有遇到過程序可以明智地選擇正確數量的羣集,所以它的試驗和錯誤atc課程正在進行異常移除工作。但我真的需要一種方法來快速找出這些點代表什麼行的原因或數據。 – 2012-07-08 20:15:35

+1

查看輪廓以選擇羣集數量:http://www.mathworks.com/help/toolbox/stats/bq_679x-18.html – Dan 2012-07-09 09:21:35

回答

5

您可以使用data cursors功能,該功能在您從圖中選擇一個點時顯示工具提示。您可以使用修改的更新功能來顯示關於所選點的各種信息。

這裏是一個工作示例:

function customCusrorModeDemo() 
    %# data 
    D = load('fisheriris'); 
    data = D.meas; 
    [clustIdx,labels] = grp2idx(D.species); 
    K = numel(labels); 
    clr = hsv(K); 

    %# instance indices grouped according to class 
    ind = accumarray(clustIdx, 1:size(data,1), [K 1], @(x){x}); 

    %# plot 
    %#gscatter(data(:,1), data(:,2), clustIdx, clr) 
    hLine = zeros(K,1); 
    for k=1:K 
     hLine(k) = line(data(ind{k},1), data(ind{k},2), data(ind{k},3), ... 
      'LineStyle','none', 'Color',clr(k,:), ... 
      'Marker','.', 'MarkerSize',15); 
    end 
    xlabel('SL'), ylabel('SW'), zlabel('PL') 
    legend(hLine, labels) 
    view(3), box on, grid on 

    %# data cursor 
    hDCM = datacursormode(gcf); 
    set(hDCM, 'UpdateFcn',@updateFcn, 'DisplayStyle','window') 
    set(hDCM, 'Enable','on') 

    %# callback function 
    function txt = updateFcn(~,evt) 
     hObj = get(evt,'Target'); %# line object handle 
     idx = get(evt,'DataIndex'); %# index of nearest point 

     %# class index of data point 
     cIdx = find(hLine==hObj, 1, 'first'); 

     %# instance index (index into the entire data matrix) 
     idx = ind{cIdx}(idx); 

     %# output text 
     txt = { 
      sprintf('SL: %g', data(idx,1)) ; 
      sprintf('SW: %g', data(idx,2)) ; 
      sprintf('PL: %g', data(idx,3)) ; 
      sprintf('PW: %g', data(idx,4)) ; 
      sprintf('Index: %d', idx) ; 
      sprintf('Class: %s', labels{clustIdx(idx)}) ; 
     }; 
    end 

end 

這裏是如何看起來像在2D和3D視圖(用不同的顯示樣式):

screenshot_2D screenshot_3D

+0

嗨!我把這個函數粘貼在matlab中,並得到以下錯誤「錯誤:在這種情況下不允許使用函數定義。」 – 2015-08-05 13:45:29