熊貓變換（）vs apply（）

我不明白爲什麼apply和transform在同一數據幀上調用時會返回不同的dtypes。我之前向我自己解釋這兩個函數的方式沿着「apply」摺疊數據，而transform完成與apply完全相同的操作，但保留了原始索引並且不會摺疊。「考慮以下。熊貓變換（）vs apply（）

df = pd.DataFrame({'id': [1,1,1,2,2,2,2,3,3,4], 
        'cat': [1,1,0,0,1,0,0,0,0,1]})

讓我們找出那些id S的具有在cat列中的非零項。

>>> df.groupby('id')['cat'].apply(lambda x: (x == 1).any()) 
id 
1  True 
2  True 
3 False 
4  True 
Name: cat, dtype: bool

太好了。但是，如果我們想創建一個指標列，我們可以做到以下幾點。

>>> df.groupby('id')['cat'].transform(lambda x: (x == 1).any()) 
0 1 
1 1 
2 1 
3 1 
4 1 
5 1 
6 1 
7 0 
8 0 
9 1 
Name: cat, dtype: int64

我不明白爲什麼現在的D型是int64而不是由any()函數返回的布爾值。

當我更改原始數據框以包含一些布爾值（注意零仍然）時，轉換方法在object列中返回布爾值。這對我來說是一個額外的謎題，因爲所有的值都是布爾值，但它顯然被列爲object以匹配原始混合類型的整數列和布爾值的dtype。

df = pd.DataFrame({'id': [1,1,1,2,2,2,2,3,3,4], 
        'cat': [True,True,0,0,True,0,0,0,0,True]}) 

>>> df.groupby('id')['cat'].transform(lambda x: (x == 1).any()) 
0  True 
1  True 
2  True 
3  True 
4  True 
5  True 
6  True 
7 False 
8 False 
9  True 
Name: cat, dtype: object

但是，當我使用所有布爾值時，轉換函數返回一個布爾列。

df = pd.DataFrame({'id': [1,1,1,2,2,2,2,3,3,4], 
        'cat': [True,True,False,False,True,False,False,False,False,True]}) 

>>> df.groupby('id')['cat'].transform(lambda x: (x == 1).any()) 
0  True 
1  True 
2  True 
3  True 
4  True 
5  True 
6  True 
7 False 
8 False 
9  True 
Name: cat, dtype: bool

使用我的急性模式識別技能，看來結果列的dtype反映，原來列。我很感激任何有關爲什麼會發生這種情況的提示，或者transform函數中發生了什麼。乾杯。

來源

2017-01-05 3novak

'apply'不垮的數據。 'apply'非常靈活，可以返回任意大小的系列或數據幀。 'transform'總是保留每個組的行數。 'transform'還將每個單獨的列作爲一系列發送到調用函數。 'apply'將整個數據幀發送到調用函數。 –

[相關]（http://stackoverflow.com/a/38579754/2336654） – piRSquared

啊哈！謝謝@piRSquared。我想我更清楚爲什麼會在閱讀該評論並查看源代碼後發生這種情況。 – 3novak

它看起來像SeriesGroupBy.transform()試圖把結果D型到同一個作爲原始列有，但DataFrameGroupBy.transform()似乎並沒有做到這一點：

In [139]: df.groupby('id')['cat'].transform(lambda x: (x == 1).any()) 
Out[139]: 
0 1 
1 1 
2 1 
3 1 
4 1 
5 1 
6 1 
7 0 
8 0 
9 1 
Name: cat, dtype: int64 

#       v  v 
In [140]: df.groupby('id')[['cat']].transform(lambda x: (x == 1).any()) 
Out[140]: 
    cat 
0 True 
1 True 
2 True 
3 True 
4 True 
5 True 
6 True 
7 False 
8 False 
9 True 

In [141]: df.dtypes 
Out[141]: 
cat int64 
id  int64 
dtype: object

來源

2017-01-07 23:38:18 MaxU

熊貓變換（）vs apply（）

回答

相關問題