1
我有一個數據框,每個組ID有+ - 100行。我想對組ID進行分組,然後只保留一列的標準差低於閾值的組。我用下面的代碼熊貓:如何選擇組內標準偏差小的組?
# df is the dataframe with all rows
# group on groupID
df_grouped = df.groupby('groupID')
# this gives a table with groupID and the std within a group
df_grouped_std = df_grouped.std()
# from the df with standard deviations, I select only the groups
# where the standard deviation is withing limits
selection = df_grouped_std[df_grouped_std['col1']<1][df_grouped_std['col2']<0.05]
# now I try to select from the original dataframe 'df_grouped' the groups that were selected in the previous step.
df_plot = df_grouped[selection]
堆棧跟蹤:
Traceback (most recent call last):
File "<ipython-input-72-2cd045ecb262>", line 1, in <module>
runfile('C:/Documents and Settings/a708818/Desktop/coloredByRol.py', wdir='C:/Documents and Settings/a708818/Desktop')
File "C:\Anaconda\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 682, in runfile
execfile(filename, namespace)
File "C:\Anaconda\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 71, in execfile
exec(compile(scripttext, filename, 'exec'), glob, loc)
File "C:/Documents and Settings/a708818/Desktop/coloredByRol.py", line 50, in <module>
df_plot = df_grouped[selection]
File "C:\Anaconda\lib\site-packages\pandas\core\groupby.py", line 3170, in __getitem__
if key not in self.obj:
File "C:\Anaconda\lib\site-packages\pandas\core\generic.py", line 688, in __contains__
return key in self._info_axis
File "C:\Anaconda\lib\site-packages\pandas\core\index.py", line 885, in __contains__
hash(key)
File "C:\Anaconda\lib\site-packages\pandas\core\generic.py", line 647, in __hash__
' hashed'.format(self.__class__.__name__))
TypeError: 'DataFrame' objects are mutable, thus they cannot be hashedus they cannot be hashed
我無法弄清楚如何選擇我想要的數據。任何提示?
使用過濾器的解決方案看起來更清潔。謝謝! – marqram