2016-11-26 62 views
1

我有一個熊貓數據框包含Facebook上的數據按「發佈類型」細分的帖子。數據框被稱爲「Posts_by_type」,它包含喜歡數量,共享數量和帖子類型。有3種類型的帖子:賽車,娛樂和促銷。有boxplot,想要標記中位數和鬍鬚值

我想在matplotlib中創建boxplot,顯示每種帖子的喜歡數量。

我的代碼工作:

Posts_by_type.boxplot(column='Likes', by='Type', grid=True) 

這將產生以下箱線圖:

enter image description here

不過,我也希望標記的中位數,並與相應的數值上箱線圖晶須。

這是可能的matplotlib?如果是這樣,任何人都可以給我一些指導如何做到這一點?

+0

的相關問題,兩者表現出解決問題的對策。 [Here](http://stackoverflow.com/a/38649932/4124317)和[here](http://stackoverflow.com/a/18861734/4124317)。你需要爭辯爲什麼他們不適用於你的情況。 – ImportanceOfBeingErnest

回答

0

一種解決方案,它還爲框添加了值。

import random 
import string 
import matplotlib.pyplot as plt 
import pandas as pd 
import numpy as np 

def get_x_tick_labels(df, grouped_by): 
    tmp = df.groupby([grouped_by]).size() 
    return ["{0}: {1}".format(k,v) for k, v in tmp.to_dict().items()] 

def series_values_as_dict(series_object): 
    tmp = series_object.to_dict().values() 
    return [y for y in tmp][0] 

def generate_dataframe(): 
    # Create a pandas dataframe... 
    _likes = [random.randint(0,300) for _ in range(100)] 
    _type = [random.choice(string.ascii_uppercase[:5]) for _ in range(100)] 
    _shares = [random.randint(0,100) for _ in range(100)] 
    return pd.DataFrame(
     {'Likes': _likes, 
     'Type': _type, 
     'shares': _shares 
     }) 

def add_values(bp, ax): 
    """ This actually adds the numbers to the various points of the boxplots""" 
    for element in ['whiskers', 'medians', 'caps']: 
     for line in bp[element]: 
      # Get the position of the element. y is the label you want 
      (x_l, y),(x_r, _) = line.get_xydata() 
      # Make sure datapoints exist 
      # (I've been working with intervals, should not be problem for this case) 
      if not np.isnan(y): 
       x_line_center = x_l + (x_r - x_l)/2 
       y_line_center = y # Since it's a line and it's horisontal 
       # overlay the value: on the line, from center to right 
       ax.text(x_line_center, y_line_center, # Position 
         '%.3f' % y, # Value (3f = 3 decimal float) 
         verticalalignment='center', # Centered vertically with line 
         fontsize=16, backgroundcolor="white") 

posts_by_type = generate_dataframe() 


fig, axes = plt.subplots(1, figsize=(20, 10)) 

bp_series = posts_by_type.boxplot(column='Likes', by='Type', 
            grid=True, figsize=(25, 10), 
            ax=axes, return_type='dict', labels=labels) 
# This should return a dict, but gives me a Series object, soo... 
bp_dict = series_values_as_dict(bp_series) 
#Now add the values 
add_values(bp_dict, axes) 
# Set a label on X-axis for each boxplot 
labels = get_x_tick_labels(posts_by_type, 'Type') 
plt.xticks(range(1, len(labels) + 1), labels) 
# Change some other texts on the graphs? 
plt.title('Likes per type of post', fontsize=22) 
plt.xlabel('Type', fontsize=18) 
plt.ylabel('Likes', fontsize=18) 
plt.suptitle('This is a pretty graph') 
plt.show() 

enter image description here