2015-05-02 60 views
2

兩者都返回每個組的第一行的DataFrame。在閱讀API參考時,它首先說的是「計算第一組值」,但當同時查看兩個輸出時,我沒有看到重大區別。groupby.first()和groupby.head(1)有什麼區別?

我錯過了什麼嗎?

df = pd.DataFrame({'id' : [1,1,1,2,2,3,3,3,3,4,4,5,6,6,6,7,7], 
        'value' : ["first","second","second","first", 
           "second","first","third","fourth", 
           "fifth","second","fifth","first", 
           "first","second","third","fourth","fifth"]}) 

First API

回答

3

的主要區別是,將first()跳到第一非空值,而head(1)不會。

如果我放棄np.nan到實例:

df = pd.DataFrame({'id' : [1,1,1,2,2,3,3,3,3,4,4,5,6,6,6,7,7], 
        'value' : [np.nan,"second","second","first", 
           "second","first","third","fourth", 
           "fifth","second","fifth","first", 
           "first","second","third","fourth","fifth"]}) 

然後我們有:(。而且,正如你看到的,head()重置指數)

>>> df.groupby('id').head(1) 
    id value 
0 1  NaN  # NaN is included 
3 2 first 
5 3 first 
9 4 second 
11 5 first 
12 6 first 
15 7 fourth 

>>> df.groupby('id').first() 
    value 
id   
1 second   # NaN is skipped 
2 first 
3 first 
4 second 
5 first 
6 first 
7 fourth 

+0

非常感謝 – canyon289