2017-08-03 37 views
0

字符串這是我的一些數據的例子列名:如何訂購是在matplot條形圖LIB

from pandas import * 
df = DataFrame({"Experience":['8 to 9 years', '12 to 13 years', '13 to 14 years', '17 to 18 years', 
       '5 to 6 years', '19 to 20 years', '20 or more years', '14 to 15 years', '3 to 4 years', 
       '10 to 11 years', 'Less than a year', '4 to 5 years', '6 to 7 years', 
       '2 to 3 years', '15 to 16 years', '11 to 12 years', '16 to 17 years', '18 to 19 years', 
       '1 to 2 years', '9 to 10 years', '7 to 8 years', '8 to 9 years', 
       '12 to 13 years', '13 to 14 years', '14 to 15 years', '3 to 4 years', 
       '17 to 18 years', '5 to 6 years', '19 to 20 years', '20 or more years', 
       '10 to 11 years', 'Less than a year', '4 to 5 years', '6 to 7 years', 
       '2 to 3 years', '15 to 16 years', '11 to 12 years', '16 to 17 years', 
       '18 to 19 years', '1 to 2 years', '9 to 10 years', '7 to 8 years'], 
       "Salary":[50000, 20000, 80000, 60000, 70000, 50000, 45000, 47000, 36000, 74000, 50000, 20000, 80000, 
         60000, 70000, 50000, 45000, 47000, 36000, 74000, 90000, 50000, 20000, 80000, 60000, 70000, 
         50000, 45000, 47000, 36000, 74000, 50000, 20000, 80000, 60000, 70000, 50000, 45000, 60000, 
         70000, 50000, 45000]}) 
df 

df['Salary'] = df['Salary'].astype('int64') 

這是柱狀圖我做了比較每個的中位數工資經驗水平:

from numpy import median 
%matplotlib inline 
group = df.groupby('Experience') 
group.aggregate(median).plot(kind='barh') 

這給了我這個圖:

graph

我希望條形圖的列名是按順序排列的(例如, 「不到一年」,「1到2年」等),但我很掙扎。初學者用熊貓做這件事最乾淨的方法是什麼?

+0

歡迎,新的編碼器!這很清楚,但不是[MVCE](https://stackoverflow.com/help/mcve)。我有一個解決方法的想法,但請寫下一個端到端的例子,例如,它有所有的導入和一些示例數據。作爲一個徹底的測試,至少應包括一個「不到一年」的案例和每個十年的案例。 – cphlewis

+0

@cphlewis非常感謝您花時間向我解釋這一點。我希望所編輯的問題更符合社區的要求。如果你認爲你有答案,那將非常感激。 – BadAtCoding

回答

1

兩種方法,第一種方法是自動的,第二種方法是通過一個系統進行分類,並用另一個系統進行標記。

最初的問題是barplot給出了「Experience」中的文本字符串,它們按字母順序對它們進行排序。我們想要一個數字順序。快速的方法是從字符串中提取數字(使用功能to_min_number)和組/繪圖,而不是編輯軸標籤,因此圖形仍然不言自明。

from pandas import * 
from matplotlib.pyplot import show 
df = DataFrame({"Experience":['8 to 9 years', '12 to 13 years', '13 to 14 years', '17 to 18 years', 
       '5 to 6 years', '19 to 20 years', '20 or more years', '14 to 15 years', '3 to 4 years', 
       '10 to 11 years', 'Less than a year', '4 to 5 years', '6 to 7 years', 
       '2 to 3 years', '15 to 16 years', '11 to 12 years', '16 to 17 years', '18 to 19 years', 
       '1 to 2 years', '9 to 10 years', '7 to 8 years', '8 to 9 years', 
       '12 to 13 years', '13 to 14 years', '14 to 15 years', '3 to 4 years', 
       '17 to 18 years', '5 to 6 years', '19 to 20 years', '20 or more years', 
       '10 to 11 years', 'Less than a year', '4 to 5 years', '6 to 7 years', 
       '2 to 3 years', '15 to 16 years', '11 to 12 years', '16 to 17 years', 
       '18 to 19 years', '1 to 2 years', '9 to 10 years', '7 to 8 years'], 
       "Salary":[50000, 20000, 80000, 60000, 70000, 50000, 45000, 47000, 36000, 74000, 50000, 20000, 80000, 
         60000, 70000, 50000, 45000, 47000, 36000, 74000, 90000, 50000, 20000, 80000, 60000, 70000, 
         50000, 45000, 47000, 36000, 74000, 50000, 20000, 80000, 60000, 70000, 50000, 45000, 60000, 
         70000, 50000, 45000]}) 
df 

df['Salary'] = df['Salary'].astype('int64') 

# Making a new column of Experience values that will plot gracefully 
def to_min_number(experience): 
    t = experience.split(' ')[0] 
    if t == 'Less': return 0 
    return int(t) 

df['Minimum experience'] = map(to_min_number, df['Experience']) 

from numpy import median 
group = df.groupby('Minimum experience') 
barplot = group.aggregate(median).plot(kind='barh', legend=None) 
barplot.set_ylabel('Minimum years experience, non-overlapping') 
barplot.set_xlabel('Salary, USD') 
show() 

enter image description here

如果你必須有原始的文本字符串,你可以改變文本的y蜱標籤背照在Minimum experience列中的值。自動pandas策劃不僅使空間的數字標籤,所以我們迫使我們正在繪製成軸的左邊空白處更多的空間:

# We are overriding the barplot defaults, so enforcing a new axis layout 
fig, ax = subplots() 
subplots_adjust(left=0.3) # Argument is proportion of figure width; found by trial-and-error 

barplot = group.aggregate(median).plot(ax=ax, kind='barh', legend=None) # pass it the ax 
barplot.set_ylabel('Experience') 
barplot.set_xlabel('Salary, USD') 

# Need a list of new tick labels in lower-to-upper order. Use the group object, since we have it: 
labellist = [] 
for i, v in group: 
    labellist.append({'I':int(i), 'T':v.Experience.values[0]}) 
labeldf = DataFrame(labellist) 

barplot.set_yticklabels(labeldf.sort_values(by='I')['T']) 


show() 

enter image description here

需要注意的是,如果原始的文本字符串不是由有限選擇的程序生成的,你應該對變體進行更多的檢查:如果某人寫了「最多1年」,該怎麼辦? 「超過20年」?