無法使用groupby和nlargest函數獲取數據集的頂級n記錄

我有一個數據框，看起來像this。無法使用groupby和nlargest函數獲取數據集的頂級n記錄

此處提供的數據包括針對3,000多家美國醫院的醫院特定費用，這些醫院根據每次放電使用的費率獲得Medicare住院預付支付系統（IPPS）支付， 2011年

醫保嚴重性診斷相關組（MS-DRG）爲會計年度（FY）我執行的數據框下面的命令在排放總量方面獲得前兩名記錄：

dataframe_1 = dataframe.groupby('Provider Id').sum() 
dataframe_1.nlargest(2,'Total Discharges')

我我得到一個錯誤如下：

C:\Users\user\Anaconda2\lib\site-packages\pandas\core\indexes\base.pyc in get_loc(self, key, method, tolerance) 
    2393     return self._engine.get_loc(key) 
    2394    except KeyError: 
-> 2395     return self._engine.get_loc(self._maybe_cast_indexer(key)) 
    2396 
    2397   indexer = self.get_indexer([key], method=method, tolerance=tolerance) 

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc (pandas\_libs\index.c:5239)() 

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc (pandas\_libs\index.c:5085)() 

pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item (pandas\_libs\hashtable.c:20405)() 

pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item (pandas\_libs\hashtable.c:20359)() 

KeyError: 'Total Discharges'

任何幫助理解錯誤是值得歡迎的！

來源

2017-07-26 GOLDEN SPARROW

你想正最大的組內'總discharges'的？如果不是，你需要運行'dataframe.nlargest（2，'Total Discharges'）''因爲'dataframe_1'是一個分組的數據框 – VinceP

是的，我需要組中最高的n條記錄。但是，即使dataframe.nlargest（2，'Total Discharges'）也會引發相同的錯誤。 –

檢查數據框中「總排放量」列的dtype。 sum（）只對數字列進行計算。 –

dataframe_1 = dataframe.groupby('Provider Id', as_index=False).sum() 
dataframe_1.nlargest(2,'Total Discharges')

VinceP已經指出了正確的根源

來源

2017-07-26 15:05:49

無法使用groupby和nlargest函數獲取數據集的頂級n記錄

回答

相關問題