排序熊貓分組後的分類標籤

我正在使用pd.cut來離散數據集。一切都很好。但是，我的問題與Categorical對象類型有關，它是由pd.cut返回的數據類型。該文檔說，Categorical對象被視爲一個字符串數組，所以我不驚訝地看到標籤在分組時被詞法排序。排序熊貓分組後的分類標籤

例如，下列代碼：

產生下列圖表：

enter image description here

（中間通知500-599）

之前分組，結構是我期望的順序：

In [94]: df['value_group'] 
Out [94]: 
59  0 - 499 
58  0 - 499 
0  500 - 999 
94  500 - 999 
76  500 - 999 
95  1000 - 1499 
17  1000 - 1499 
48  1000 - 1499

我已經玩了一段時間了，唯一能避免這種情況的方法就是在標籤前加一個前導字母字符，例如： ['A) 0 - 499', 'B) 500-999', ... ]這讓我畏縮。我研究的其他事情是提供一個自定義的groupby實現，這似乎不可能（或者正確的事情）。我錯過了什麼？

來源

2014-05-22 Bill

請看看在大熊貓以下PR：這是目前在大熊貓製作：https://github.com/pydata /熊貓/拉/ 7217。這種操作應該與該公關一起工作（如果不是這是一個bug ...） –

這也咬了我。可能是正確的解決方法是提高對分類對象的原生支持，但在此期間我做最後的分揀解決這個問題在實踐中：

In [104]: z = df.groupby('value_group').size() 

In [105]: z[sorted(z.index, key=lambda x: float(x.split()[0]))] 
Out[105]: 
0 - 499  5 
500 - 999  6 
1000 - 1499 4 
1500 - 1999 6 
2000 - 2499 4 
2500 - 2999 6 
3000 - 3499 3 
3500 - 3999 3 
4000 - 4499 2 
4500 - 4999 6 
5000 - 5499 6 
5500 - 5999 5 
6000 - 6499 6 
6500 - 6999 2 
7000 - 7499 9 
7500 - 7999 3 
8000 - 8499 7 
8500 - 8999 6 
9000 - 9499 5 
9500 - 9999 6 
dtype: int64 

In [106]: z[sorted(z.index, key=lambda x: float(x.split()[0]))].plot(kind='bar') 
Out[106]: <matplotlib.axes.AxesSubplot at 0xbe87d30>

demo with better order

來源

2014-05-22 18:47:43 DSM

enter image description here 您可以對自己的數據進行自定義排序。比方說：

group = df.groupby(['value_group'])['value_group'].count() 
sortd= group.reindex_axis(sorted(group.index, key=lambda x: int(x.split("-")[0])))

，然後如果你繪製SORTD系列它的工作原理。

來源

2014-05-22 18:46:37 grasshopper

爲了任何人究竟是誰使得它歸結爲答案的一部分，只需添加sorted=False參數保留原有排序：

df.groupby(['value_group'], sorted=False)['value_group'].count().plot(kind='bar')

來源

2017-10-18 16:05:42

排序熊貓分組後的分類標籤

回答

相關問題