適用於非分組數據框？

試圖實現一個簡單的FUNC標誌着一個組作爲True，隨機適用於非分組數據框？

數據框：

In [145]: df = pd.DataFrame({'a': [1,1,1,2,2], 'b': [3,3,3,3,3]}) 

In [146]: df 
Out[146]: 
    a b 
0 1 3 
1 1 3 
2 1 3 
3 2 3 
4 2 3

功能：

def pickone(df, group, out): 
    u = df[group].unique() 
    p = np.random.choice(u, 1)[0] 
    df[out] = False 
    df[df[group]==p][out] = True 
    return df

應用它適用於分組罰款數據框：

In [148]: df.groupby(['b']).apply(pickone, group="a", out="c") 
Out[148]: 
    a b  c 
0 1 3 True 
1 1 3 True 
2 1 3 True 
3 2 3 False 
4 2 3 False

但不能在非分組DFS：

In [149]: df.apply(pickone, group="a", out="c") 
--------------------------------------------------------------------------- 
TypeError         Traceback (most recent call last) 
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc (pandas/_libs/index.c:5085)() 

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item (pandas/_libs/hashtable.c:13892)() 

TypeError: an integer is required 

During handling of the above exception, another exception occurred: 

KeyError         Traceback (most recent call last) 
<ipython-input-149-86c0d6e0e423> in <module>() 
----> 1 df.apply(pickone, group="a", out="c")

來源

2017-07-23 salient

df是一個數據幀，而df.groupby(...)是DataFrameGroupBy對象。 DataFrame.apply和DataFrameGroupBy.apply是兩種完全不同的方法。

df.apply用於爲每行（默認）或每列調用一次函數。該函數需要一個Series（行或列）作爲其第一個參數。

df.groupby(...).apply用於每組調用一次函數。該函數需要一個（子）DataFrame作爲其第一個參數。

要調用pickone上df，使用

pickone(df, group='a', out='c')

，而不是df.apply(pickone, ...)。

順便說一句，

df[df[group]==p][out] = True

是使用鏈式索引的分配。由於對於某些數據幀，df[df[group]==p]可能會返回一個新的數據幀，其數據從df複製而來，因此df[df[group]==p][out] = True可能會修改新的DataFrame而不是df本身。

因此，帶鏈接索引的分配是considered a no-no。而是使用df.loc：

df[out] = False 
df.loc[df[group]==p, out] = True

，或者在這種情況下，

df[out] = (df[group]==p)

就足夠了。

來源

2017-07-23 14:42:58 unutbu

適用於非分組數據框？

回答

相關問題