Python的大熊貓：選擇GROUPBY

第二小值，我有一個例子數據框類似如下：Python的大熊貓：選擇GROUPBY

import pandas as pd 
import numpy as np 
df = pd.DataFrame({'ID':[1,2,2,2,3,3,], 'date':array(['2000-01-01','2002-01-01','2010-01-01','2003-01-01','2004-01-01','2008-01-01'],dtype='datetime64[D]')})

我想獲得第二個最早的一天每個ID小組。所以我寫了下面的功能可按：

def f(x): 
    if len(x)==1: 
     return x[0] 
    else: 
     x.sort() 
     return x[1]

然後我寫道：

df.groupby('ID').date.apply(lambda x:f(x))

結果是錯誤的。

你能找到一種方法使這項工作？

來源

2014-07-24 midtownguru

使用nsmallest，加入0.14.1：https://github.com/pydata/pandas/pull/7356 – Jeff

這需要0.14.1。而且會很有效率，特別是如果你有很大的羣體（因爲這不需要完全排序）。

In [32]: df.groupby('ID')['date'].nsmallest(2) 
Out[32]: 
ID 
1 0 2000-01-01 
2 1 2002-01-01 
    3 2003-01-01 
3 4 2004-01-01 
    5 2008-01-01 
dtype: datetime64[ns] 

In [33]: df.groupby('ID')['date'].nsmallest(2).groupby(level='ID').last() 
Out[33]: 
ID 
1 2000-01-01 
2 2003-01-01 
3 2008-01-01 
dtype: datetime64[ns]

來源

2014-07-24 22:16:21 Jeff

你也可以通過一個列表 – Jeff

看看索引docs - 一般熊貓默認使用標籤索引而不是位置索引 - 這就是爲什麼你會得到KeyError。

在您的特定情況下，您可以使用.iloc進行基於位置的索引編制。

In [266]: def f(x): 
    ...:  if len(x)==1: 
    ...:   return x.iloc[0] 
    ...:  else: 
    ...:   x.sort() 
    ...:   return x.iloc[1] 
    ...:  

In [267]: df.groupby('ID').date.apply(f) 
Out[267]: 
ID 
1 2000-01-01 
2 2003-01-01 
3 2008-01-01 
Name: date, dtype: datetime64[ns]

來源

2014-07-24 21:10:54 chrisb

約'.iloc'與'的文檔的特定部分.loc' [這裏] （http://pandas.pydata.org/pandas-docs/stable/indexing.html#different-choices-for-indexing-loc-iloc-and-ix）。 – jmduke

Python的大熊貓：選擇GROUPBY

回答

相關問題