2014-07-08 151 views
1

我有我想散點圖的4D數據陣列。可以將數據看作兩個附加參數的每對值的x座標和y座標。4D數據的Python散點圖

我想將繪圖「變平」爲二維散點圖,其中兩個額外參數用不同的顏色表示,例如,兩個參數的每一對的顏色。或者,我希望僅對少數參數對繪製的點看起來較亮,而對許多參數對繪製的點看起來較重/較暗。也許這可以通過在彼此頂部「堆疊」有點半透明的點來實現?

有沒有一些標準的方法來做到這一點在Python中,例如使用matplotlib

+2

也許散點圖矩陣是更好的解決方案。看[這裏](http://pandas.pydata.org/pandas-docs/stable/visualization.html#scatter-matrix-plot)爲例。 – Andrej

+0

這看起來很有趣。不幸的是,我沒有'熊貓'的經驗,但也許我應該檢查出來。 –

+1

在[這個問題]中有相關的純matplotlib示例(http://stackoverflow.com/questions/7941207/is-there-a-function-to-make-scatterplot-matrices-in-matplotlib)。 [@ tisimst的回答](http://stackoverflow.com/a/16489216/3751373),這是[@Joe Kington's](http://stackoverflow.com/a/7941594/3751373)的重構,似乎是最完整的。 –

回答

0

我想我所建議的「堆積」半透明散點圖方法在彼此的頂部:

import numpy as np 
import matplotlib.pyplot as plt 

for ii in xrange(len(param1)): 
    for jj in xrange(len(param2)): 
     delta_idx, rho_idx = np.where(data1[:,:,ii,jj] < data2[:,:,ii,jj]) 
     plt.scatter(delta_idx, rho_idx, marker = 'o', c = 'k', alpha = 0.01) 
plt.xlabel('$\delta$') 
plt.ylabel('$\rho$') 
plt.show() 

我在我的問題描述的二維點實際上是其中data1的值是一個標識小於data2中的相應值。這產生了以下情節:Stacked scatter plot

還有很多可以做的很好,如果情節,但我不是很滿意它的樣子,所以我嘗試了另一個approach。無論如何,如果有人發現它有用,我會在這裏發佈。

0

作爲the "stacked" scatter plot的替代方法,我試着在第一個2D「出現圖」中累積了data1 < data2的出現次數。然後我繪製使用pcolormesh(從prettyplotlib進口,使它看起來更好)本地圖:

import prettyplotlib as ppl 
import numpy as np 

occurrence_map = np.sum(data1 < data2, axis=(2,3), dtype=float)/np.prod(data1.shape[2:]) 
ppl.pcolormesh(occurrence_map2, vmin=0, vmax=1) 

正常化是爲了生產occurence的相對度量,即在多大的參數對一小部分(data1data2的最後兩個維度)是data1 < data2?然後該地塊被配置爲顏色值的範圍從0到1這將產生以下曲線圖,我更高興:

pcolormesh plot of relative occurences

0

約散點圖矩陣的意見促使我嘗試類似的東西以及。散點圖矩陣不正是我一直在尋找,但我把代碼@tisimst's answer通過@ LBN加1建議和將它改編了一下,如下:

import itertools 
import numpy as np 
import matplotlib.pyplot as plt 

def scatterplot_matrix(data, names=[], **kwargs): 
    """Plots a pcolormesh matrix of subplots. The two first dimensions of 
    data are plotted as a mesh of values, one for each of the two last 
    dimensions of data. Data must thus be four-dimensional and results 
    in a matrix of pcolormesh plots with the number of rows equal to 
    the size of the third dimension of data and number of columns 
    equal to the size of the fourth dimension of data. Additional 
    keyword arguments are passed on to matplotlib\'s \"pcolormesh\" 
    command. Returns the matplotlib figure object containg the subplot 
    grid. 
    """ 
    assert data.ndim == 4, 'data must be 4-dimensional.' 
    datashape = data.shape 
    fig, axes = plt.subplots(nrows=datashape[2], ncols=datashape[3], figsize=(8,8)) 
    fig.subplots_adjust(hspace=0.0, wspace=0.0) 

    for ax in axes.flat: 
     # Hide all ticks and labels 
     ax.xaxis.set_visible(False) 
     ax.yaxis.set_visible(False) 

     # Set up ticks only on one side for the "edge" subplots... 
     if ax.is_first_col(): 
      ax.yaxis.set_ticks_position('left') 
     if ax.is_last_col(): 
      ax.yaxis.set_ticks_position('right') 
     if ax.is_first_row(): 
      ax.xaxis.set_ticks_position('top') 
     if ax.is_last_row(): 
      ax.xaxis.set_ticks_position('bottom') 

    # Plot the data. 
    for ii in xrange(datashape[2]): 
     for jj in xrange(datashape[3]): 
      axes[ii,jj].pcolormesh(data[:,:,ii,jj], **kwargs) 

    # Label the diagonal subplots... 
    #if not names: 
    # names = ['x'+str(i) for i in range(numvars)] 
    # 
    #for i, label in enumerate(names): 
    # axes[i,i].annotate(label, (0.5, 0.5), xycoords='axes fraction', 
    #   ha='center', va='center') 

    # Turn on the proper x or y axes ticks. 
    #for i, j in zip(range(numvars), itertools.cycle((-1, 0))): 
    # axes[j,i].xaxis.set_visible(True) 
    # axes[i,j].yaxis.set_visible(True) 

    # FIX #2: if numvars is odd, the bottom right corner plot doesn't have the 
    # correct axes limits, so we pull them from other axes 
    #if numvars%2: 
    # xlimits = axes[0,-1].get_xlim() 
    # ylimits = axes[-1,0].get_ylim() 
    # axes[-1,-1].set_xlim(xlimits) 
    # axes[-1,-1].set_ylim(ylimits) 

    return fig 

if __name__=='__main__': 
    np.random.seed(1977) 
    data = np.random.random([10] * 4) 
    fig = scatterplot_matrix(data, 
      linestyle='none', marker='o', color='black', mfc='none') 
    fig.suptitle('Simple Scatterplot Matrix') 
    plt.show() 

我救了上述模塊數據矩陣。吡啶和按如下方式使用它:

import datamatrix 
import brewer2mpl 

colors = brewer2mpl.get_map('RdBu', 'Diverging', 11).mpl_colormap 
indicator = np.ma.masked_invalid(-np.sign(data1 - data2)) # Negated because the 'RdBu' colormap is the wrong way around 
fig = datamatrix.scatterplot_matrix(indicator, cmap = colors) 
plt.show() 

brewer2mpl和彩色圖的東西可以被排除在外 - 這只是一些着色我被玩弄周圍。它導致以下情節:

matrix of pcolormesh plots of occurrences for individual parameter values

「外」矩陣的尺寸是兩個參數(data1data2最後兩個尺寸)。然後矩陣中的每個pmeshcolor圖都是一個「出現圖」,它與this answer有點類似,但是對於每個參數對來說都是二進制圖。一些地塊底部的白線是平等地區。數據中每個右上角的白點是nan值。