大熊貓：操作使用GROUPBY產量SettingWithCopyWarning

比方說，我有以下的熊貓數據框：大熊貓：操作使用GROUPBY產量SettingWithCopyWarning

df = pd.DataFrame({ 
    'team': ['Warriors', 'Warriors', 'Warriors', 'Rockets', 'Rockets'], 
    'player': ['Stephen Curry', 'Klay Thompson', 'Kevin Durant', 'Chris Paul', 'James Harden']})

當我嘗試組對team列和執行操作，我得到一個SettingWithCopyWarning：

for team, team_df in df.groupby(by='team'): 
    # team_df = team_df.copy() # produces no warning 
    team_df['rank'] = 10 # produces warning 
    team_df.loc[:, 'rank'] = 10 # produces warning 

SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. 
Try using .loc[row_index,col_indexer] = value instead 
df_team['rank'] = 10

如果我取消註釋生成子DataFrame副本的行，我不會收到錯誤。這是避免這種警告的最佳做法嗎？或者我做錯了什麼？

注意我不想編輯原始DataFrame df。另外我知道這個例子可以做得更好，但我的用例更復雜，需要對原始DataFrame進行分組，並根據不同的DataFrame和該唯一組的規格執行一系列操作。

來源

2017-07-14 Johnny Metz

一旦你神交this article並信心，你知道如何避免鏈式索引（通過使用.loc或 iloc），那麼你可以關閉SettingWithCopyWarning與 pd.options.mode.chained_assignment = None就再也不用爲這個警告以後再打擾。

既然你寫

注意我不想編輯原始數據幀DF

和你正確使用.loc分配給team_df，很顯然你已經知道，修改副本（team_df）不會修改原始的（df），因此SettingWithCopyWarning發出這只是一個滋擾。

SettingWithCopyWarning出現在各種情況下，你是編碼正確，即使.loc或.iloc。沒有「適當」的方式來編碼，這避免了有時觸發SettingWithCopyWarning。

因此，我只想用

pd.options.mode.chained_assignment = None

關閉該警告全球我一般不建議使用team_df = team_df.copy()，只是爲了避免 SettingWithCopyWarning秒 - 複製一個數據幀可以在性能漏尤其是當數據幀很大或者在循環中多次執行時。

如果你想turn off the warning in just one location，你可以使用

team_df.is_copy = False

它有異曲同工之妙，但不會是一個性能消耗。但請注意，在官方Pandas API中沒有提及，所以它可能不是保證存在或可用於此目的的所有將來版本的熊貓。所以如果魯棒性是一個優先事項，但性能不是那麼可能使用 team_df = team_df.copy()。但我認爲對於一個有經驗的 Pandas程序員來說，更健全的方式是要麼全局關閉警告，要麼 - 如果你想要非常小心 - 保留警告，手動檢查它們，但接受它有時會由正確的代碼觸發。

來源

2017-07-14 19:01:42 unutbu

偉大的鏈接到文章！ – piRSquared

@piRSquared：我從[Alexander]（https://stackoverflow.com/users/2411802/alexander），[這裏]瞭解了[文章]（https://www.dataquest.io/blog/settingwithcopywarning/）（https://stackoverflow.com/questions/38809796/pandas-still-getting-settingwithcopywarning-even-after-using-loc/38810015#comment76884959_38809796）。 – unutbu

pandas split apply combine docs在這方面不太好。這應該指向你在正確的方向

def apply_fun(team_df): 
    team_df['rank'] = 10 
    return team_df 

df.groupby('team').apply(apply_fun) 
df['column_rank'] = df.groupby('team')['column'].transform(lambda x: x.rank())

來源

2017-07-25 12:18:02 citynorman

大熊貓：操作使用GROUPBY產量SettingWithCopyWarning

回答

相關問題