2015-07-20 20 views
12

我正在做一些KMeans聚類在一個大而密集的數據集上,我試圖找出最佳的方式來可視化這些聚類。使用hexbin重疊繪製多組數據

在2D中,它看起來像hexbin會做得很好,但我無法重疊繪製同一圖上的集羣。我想在每個羣集上分別使用hexbin,每個羣集都有不同的顏色映射,但由於某種原因,這似乎不起作用。 該圖顯示了我在繪製第二組和第三組數據時所得到的結果。

有關如何去解決這個問題的任何建議? enter image description here

一些擺弄後,我能夠與Seaborn's kdeplot

enter image description here

+4

有趣的問題。你有沒有嘗試過爲每個hexbin設置'alpha'?你能做一個最小,完整和可驗證的例子嗎? – farenorth

+1

@Labibah這個問題看起來很有趣,同樣的問題:你能做一個最小化,完整和可驗證的例子嗎? – rll

+0

我同意@ farenorth。你可以用你的繪圖代碼發佈/創建一個帶有隨機數字的假數據集(格式化完成集羣后的樣子)。對於其他人來說,玩的速度會快很多...... – tmthydvnprt

回答

3

個人,使這個我覺得從kdeplot您的解決方案是相當不錯的(雖然我的工作一點上部分是集羣攔截)。在任何情況下,作爲對你的問題的迴應,你可以給hexbin提供一個最小數量(把所有空單元保留爲透明)。這裏有一個小功能產生隨機集羣的人,可能要進行一些實驗(在評論你的問題似乎建立了大量的來自用戶的興趣,下跌自由地使用它):

import numpy as np 
import matplotlib.pyplot as plt 

# Building random clusters 
def cluster(number): 
    def clusterAroundX(a,b,number): 
     x = np.random.normal(size=(number,)) 
     return (x-x.min())*(b-a)/(x.max()-x.min())+a 
    def clusterAroundY(x,m,b): 
     y = x.copy() 
     half = (x.max()-x.min())/2 
     middle = half+x.min() 
     for i in range(x.shape[0]): 
      std = (x.max()-x.min())/(2+10*(np.abs(middle-x[i])/half)) 
      y[i] = np.random.normal(x[i]*m+b,std) 
     return y + np.abs(y.min()) 
    m,b = np.random.randint(-700,700)/100,np.random.randint(0,50) 
    print(m,b) 
    f = np.random.randint(0,30) 
    l = f + np.random.randint(10,50) 
    x = clusterAroundX(f,l,number) 
    y = clusterAroundY(x,m,b) 
    return x,y 

,使用此代碼我已經制作了一些集羣,用散點圖繪製它們(我通常使用它來進行我自己的聚類分析,但我想我應該看看seaborn),hexbin,imshow(更改pcolormesh以獲取更多控制權限)和contourf:

clusters = 5 
samples = 300 
xs,ys = [],[] 
for i in range(clusters): 
    x,y = cluster(samples) 
    xs.append(x) 
    ys.append(y) 

# SCATTERPLOT 
alpha = 1 
for i in range(clusters): 
    x,y = xs[i],ys[i] 
    color = (np.random.randint(0,255)/255,np.random.randint(0,255)/255,np.random.randint(0,255)/255) 
    plt.scatter(x,y,c = color,s=90,alpha=alpha) 
plt.show() 

# HEXBIN 
# Hexbin seems a bad choice because I think you cant control the size of the hexagons. 
alpha = 1 
cmaps = ['Reds','Blues','Purples','Oranges','Greys'] 
for i in range(clusters): 
    x,y = xs[i],ys[i] 
    plt.hexbin(x,y,gridsize=20,cmap=cmaps.pop(),mincnt=1) 
plt.show() 

# IMSHOW 
alpha = 1 
cmaps = ['Reds','Blues','Purples','Oranges','Greys'] 
xmin,xmax = min([i.min() for i in xs]), max([i.max() for i in xs]) 
ymin,ymax = min([i.min() for i in ys]), max([i.max() for i in ys]) 
nums = 30 
xsize,ysize = (xmax-xmin)/nums,(ymax-ymin)/nums 
im = [np.zeros((nums+1,nums+1)) for i in range(len(xs))] 
def addIm(im,x,y): 
    for i,j in zip(x,y): 
     im[i,j] = im[i,j]+1 
    return im 
for i in range(len(xs)): 
    xo,yo = np.int_((xs[i]-xmin)/xsize),np.int_((ys[i]-ymin)/ysize) 
    #im[i][xo,yo] = im[i][xo,yo]+1 
    im[i] = addIm(im[i],xo,yo) 
    im[i] = np.ma.masked_array(im[i],mask=(im[i]==0)) 
for i in range(clusters): 
    # REPLACE BY pcolormesh if you need more control over image locations. 
    plt.imshow(im[i].T,origin='lower',interpolation='nearest',cmap=cmaps.pop()) 
plt.show() 

# CONTOURF 
cmaps = ['Reds','Blues','Purples','Oranges','Greys'] 
for i in range(clusters): 
    # REPLACE BY pcolormesh if you need more control over image locations. 
    plt.contourf(im[i].T,origin='lower',interpolation='nearest',cmap=cmaps.pop()) 
plt.show() 

,結果是folloing:

scatterplot clusters

hexbin clusters

imshow clusters

countourf clusters