2013-05-16 150 views
40

有沒有辦法在matplotlib中對箱形圖進行分組?matplotlib:組箱形圖

假設我們有三個組「A」,「B」和「C」,並且每個我們想爲「蘋果」和「桔子」創建一個箱形圖。如果分組不可能直接進行,我們可以創建所有六種組合並將它們並排排列。什麼是最簡單的方式來形象化分組?我試圖避免將刻度標籤設置爲「A +蘋果」之類的東西,因爲我的場景涉及比「A」更長的名稱。

回答

68

如何使用顏色來區分「蘋果」和「橙子」以及間隔來分隔「A」,「B」和「C」?

事情是這樣的:

from pylab import plot, show, savefig, xlim, figure, \ 
       hold, ylim, legend, boxplot, setp, axes 

# function for setting the colors of the box plots pairs 
def setBoxColors(bp): 
    setp(bp['boxes'][0], color='blue') 
    setp(bp['caps'][0], color='blue') 
    setp(bp['caps'][1], color='blue') 
    setp(bp['whiskers'][0], color='blue') 
    setp(bp['whiskers'][1], color='blue') 
    setp(bp['fliers'][0], color='blue') 
    setp(bp['fliers'][1], color='blue') 
    setp(bp['medians'][0], color='blue') 

    setp(bp['boxes'][1], color='red') 
    setp(bp['caps'][2], color='red') 
    setp(bp['caps'][3], color='red') 
    setp(bp['whiskers'][2], color='red') 
    setp(bp['whiskers'][3], color='red') 
    setp(bp['fliers'][2], color='red') 
    setp(bp['fliers'][3], color='red') 
    setp(bp['medians'][1], color='red') 

# Some fake data to plot 
A= [[1, 2, 5,], [7, 2]] 
B = [[5, 7, 2, 2, 5], [7, 2, 5]] 
C = [[3,2,5,7], [6, 7, 3]] 

fig = figure() 
ax = axes() 
hold(True) 

# first boxplot pair 
bp = boxplot(A, positions = [1, 2], widths = 0.6) 
setBoxColors(bp) 

# second boxplot pair 
bp = boxplot(B, positions = [4, 5], widths = 0.6) 
setBoxColors(bp) 

# thrid boxplot pair 
bp = boxplot(C, positions = [7, 8], widths = 0.6) 
setBoxColors(bp) 

# set axes limits and labels 
xlim(0,9) 
ylim(0,9) 
ax.set_xticklabels(['A', 'B', 'C']) 
ax.set_xticks([1.5, 4.5, 7.5]) 

# draw temporary red and blue lines and use them to create a legend 
hB, = plot([1,1],'b-') 
hR, = plot([1,1],'r-') 
legend((hB, hR),('Apples', 'Oranges')) 
hB.set_visible(False) 
hR.set_visible(False) 

savefig('boxcompare.png') 
show() 

grouped box plot

+0

這是一個非常好的解決方案,因爲你不僅通過色彩和分組的位置摸索!由於看起來沒有內置功能,這正是我所需要的。非常感謝你! – bluenote10

+4

由於https://github.com/matplotlib/matplotlib/issues/3544,此示例與matplotlib 1.3.1完美配合,但不是1.4.0(儘管您選擇的數據沒有異常值,因此問題不會顯示,在訪問'bp ['fliers'] [2]')時仍然會出錯。 – anonymous

+0

在熊貓中,顯然可以通過給出顏色屬性來設置箱形圖的顏色:'data.plot(kind ='box',color ='blue')' – Peter9192

24

一個簡單的方法是使用pandas。 我改編自plotting documentation一個例子:

In [1]: import pandas as pd 

In [2]: df = pd.DataFrame(np.random.rand(12,2), columns=['Apples', 'Oranges']) 

In [3]: df['Categories'] = pd.Series(list('AAAABBBBCCCC')) 

In [4]: pd.options.display.mpl_style = 'default' 

In [5]: df.boxplot(by='Categories') 
Out[5]: 
array([<matplotlib.axes.AxesSubplot object at 0x51a5190>, 
     <matplotlib.axes.AxesSubplot object at 0x53fddd0>], dtype=object) 

pandas boxplot

+0

非常感謝許多!這也是一個非常有趣的建議! – bluenote10

+1

我無法弄清楚如何做到這一點 - 每個水果的箱形圖,按類別分組(與molly的annwer相同)。有沒有辦法? – naught101

+0

不確定「逆」應該是什麼。如果你的意思完全是從莫莉的答案(只有一個子圖)的情節,這是不可能的熊貓繪圖命令。你必須使用matplotlib和一個更復雜的腳本。 – bmu

15

這裏是我的版本。它根據類別存儲數據。

import matplotlib.pyplot as plt 
import numpy as np 

data_a = [[1,2,5], [5,7,2,2,5], [7,2,5]] 
data_b = [[6,4,2], [1,2,5,3,2], [2,3,5,1]] 

ticks = ['A', 'B', 'C'] 

def set_box_color(bp, color): 
    plt.setp(bp['boxes'], color=color) 
    plt.setp(bp['whiskers'], color=color) 
    plt.setp(bp['caps'], color=color) 
    plt.setp(bp['medians'], color=color) 

plt.figure() 

bpl = plt.boxplot(data_a, positions=np.array(xrange(len(data_a)))*2.0-0.4, sym='', widths=0.6) 
bpr = plt.boxplot(data_b, positions=np.array(xrange(len(data_b)))*2.0+0.4, sym='', widths=0.6) 
set_box_color(bpl, '#D7191C') # colors are from http://colorbrewer2.org/ 
set_box_color(bpr, '#2C7BB6') 

# draw temporary red and blue lines and use them to create a legend 
plt.plot([], c='#D7191C', label='Apples') 
plt.plot([], c='#2C7BB6', label='Oranges') 
plt.legend() 

plt.xticks(xrange(0, len(ticks) * 2, 2), ticks) 
plt.xlim(-2, len(ticks)*2) 
plt.ylim(0, 8) 
plt.tight_layout() 
plt.savefig('boxcompare.png') 

我缺乏聲譽,所以我不能發佈圖像到這裏。 您可以運行它並查看結果。基本上它與莫莉所做的非常相似。

需要注意的是,這取決於你所使用的Python的版本,您可能需要range

Result of this code

+2

看起來你不使用變量'mu'和'alpha'。否則,我真的很喜歡你的解決方案,因爲它接近於通用的解決方案,只需要根據代碼調整分組的類別數量。 – Horstinator

+1

這比Molly提出的更好,更強大。 – durbachit

+0

這是本頁所有答案中最好的解決方案。正如@Horstinator指出的那樣,它不需要蘋果與橙色中相同數量的樣品。 – Chris

1

我更換xrange是一個功能我寫了需要莫莉的代碼和其他一些代碼中,我在互聯網上找到,使略微愛好者分組箱線圖:

import numpy as np 
import matplotlib.pyplot as plt 

def custom_legend(colors, labels, linestyles=None): 
    """ Creates a list of matplotlib Patch objects that can be passed to the legend(...) function to create a custom 
     legend. 

    :param colors: A list of colors, one for each entry in the legend. You can also include a linestyle, for example: 'k--' 
    :param labels: A list of labels, one for each entry in the legend. 
    """ 

    if linestyles is not None: 
     assert len(linestyles) == len(colors), "Length of linestyles must match length of colors." 

    h = list() 
    for k,(c,l) in enumerate(zip(colors, labels)): 
     clr = c 
     ls = 'solid' 
     if linestyles is not None: 
      ls = linestyles[k] 
     patch = patches.Patch(color=clr, label=l, linestyle=ls) 
     h.append(patch) 
    return h 


def grouped_boxplot(data, group_names=None, subgroup_names=None, ax=None, subgroup_colors=None, 
        box_width=0.6, box_spacing=1.0): 
    """ Draws a grouped boxplot. The data should be organized in a hierarchy, where there are multiple 
     subgroups for each main group. 

    :param data: A dictionary of length equal to the number of the groups. The key should be the 
       group name, the value should be a list of arrays. The length of the list should be 
       equal to the number of subgroups. 
    :param group_names: (Optional) The group names, should be the same as data.keys(), but can be ordered. 
    :param subgroup_names: (Optional) Names of the subgroups. 
    :param subgroup_colors: A list specifying the plot color for each subgroup. 
    :param ax: (Optional) The axis to plot on. 
    """ 

    if group_names is None: 
     group_names = data.keys() 

    if ax is None: 
     ax = plt.gca() 
    plt.sca(ax) 

    nsubgroups = np.array([len(v) for v in data.values()]) 
    assert len(np.unique(nsubgroups)) == 1, "Number of subgroups for each property differ!" 
    nsubgroups = nsubgroups[0] 

    if subgroup_colors is None: 
     subgroup_colors = list() 
     for k in range(nsubgroups): 
      subgroup_colors.append(np.random.rand(3)) 
    else: 
     assert len(subgroup_colors) == nsubgroups, "subgroup_colors length must match number of subgroups (%d)" % nsubgroups 

    def _decorate_box(_bp, _d): 
     plt.setp(_bp['boxes'], lw=0, color='k') 
     plt.setp(_bp['whiskers'], lw=3.0, color='k') 

     # fill in each box with a color 
     assert len(_bp['boxes']) == nsubgroups 
     for _k,_box in enumerate(_bp['boxes']): 
      _boxX = list() 
      _boxY = list() 
      for _j in range(5): 
       _boxX.append(_box.get_xdata()[_j]) 
       _boxY.append(_box.get_ydata()[_j]) 
      _boxCoords = zip(_boxX, _boxY) 
      _boxPolygon = plt.Polygon(_boxCoords, facecolor=subgroup_colors[_k]) 
      ax.add_patch(_boxPolygon) 

     # draw a black line for the median 
     for _k,_med in enumerate(_bp['medians']): 
      _medianX = list() 
      _medianY = list() 
      for _j in range(2): 
       _medianX.append(_med.get_xdata()[_j]) 
       _medianY.append(_med.get_ydata()[_j]) 
       plt.plot(_medianX, _medianY, 'k', linewidth=3.0) 

      # draw a black asterisk for the mean 
      plt.plot([np.mean(_med.get_xdata())], [np.mean(_d[_k])], color='w', marker='*', 
         markeredgecolor='k', markersize=12) 

    cpos = 1 
    label_pos = list() 
    for k in group_names: 
     d = data[k] 
     nsubgroups = len(d) 
     pos = np.arange(nsubgroups) + cpos 
     label_pos.append(pos.mean()) 
     bp = plt.boxplot(d, positions=pos, widths=box_width) 
     _decorate_box(bp, d) 
     cpos += nsubgroups + box_spacing 

    plt.xlim(0, cpos-1) 
    plt.xticks(label_pos, group_names) 

    if subgroup_names is not None: 
     leg = custom_legend(subgroup_colors, subgroup_names) 
     plt.legend(handles=leg) 

您可以使用功能(S)是這樣的:

data = { 'A':[np.random.randn(100), np.random.randn(100) + 5], 
     'B':[np.random.randn(100)+1, np.random.randn(100) + 9], 
     'C':[np.random.randn(100)-3, np.random.randn(100) -5] 
     } 

grouped_boxplot(data, group_names=['A', 'B', 'C'], subgroup_names=['Apples', 'Oranges'], subgroup_colors=['#D02D2E', '#D67700']) 
plt.show() 
3

只需添加到談話中,我發現了一個更優雅的方式通過循環對象本身的詞典改變箱形圖的顏色

import numpy as np 
import matplotlib.pyplot as plt 

def color_box(bp, color): 

    # Define the elements to color. You can also add medians, fliers and means 
    elements = ['boxes','caps','whiskers'] 

    # Iterate over each of the elements changing the color 
    for elem in elements: 
     [plt.setp(bp[elem][idx], color=color) for idx in xrange(len(bp[elem]))] 
    return 

a = np.random.uniform(0,10,[100,5])  

bp = plt.boxplot(a) 
color_box(bp, 'red') 

Original box plot

Modified box plot

乾杯!

3

模擬數據:

df = pd.DataFrame({'Group':['A','A','A','B','C','B','B','C','A','C'],\ 
        'Apple':np.random.rand(10),'Orange':np.random.rand(10)}) 
df = df[['Group','Apple','Orange']] 

     Group Apple  Orange 
    0  A 0.465636 0.537723 
    1  A 0.560537 0.727238 
    2  A 0.268154 0.648927 
    3  B 0.722644 0.115550 
    4  C 0.586346 0.042896 
    5  B 0.562881 0.369686 
    6  B 0.395236 0.672477 
    7  C 0.577949 0.358801 
    8  A 0.764069 0.642724 
    9  C 0.731076 0.302369 

可以使用Seaborn庫這些地塊。首先用melt數據框來格式化數據,然後創建您選擇的箱形圖。

import pandas as pd 
import matplotlib.pyplot as plt 
import seaborn as sns 
dd=pd.melt(df,id_vars=['Group'],value_vars=['Apple','Orange'],var_name='fruits') 
sns.boxplot(x='Group',y='value',data=dd,hue='fruits') 

enter image description here