熊貓：Impn NaN的

我有一個不完整的數據框，incomplete_df，如下所示。我想用相應的id的平均amount推算缺失的amount。如果該特定id的平均值本身爲NaN（請參見id=4），我想使用總體平均值。熊貓：Impn NaN的

下面是示例數據和我的非常低效的解決方案：

import pandas as pd 
import numpy as np 
incomplete_df = pd.DataFrame({'id': [1,2,3,2,2,3,1,1,1,2,4], 
           'type': ['one', 'one', 'two', 'three', 'two', 'three', 'one', 'two', 'one', 'three','one'], 
         'amount': [345,928,np.NAN,645,113,942,np.NAN,539,np.NAN,814,np.NAN] 
         }, columns=['id','type','amount']) 

# Forrest Gump Solution 
for idx in incomplete_df.index[np.isnan(incomplete_df.amount)]: # loop through all rows with amount = NaN 
    cur_id = incomplete_df.loc[idx, 'id'] 
    if (cur_id in means.index): 
     incomplete_df.loc[idx, 'amount'] = means.loc[cur_id]['amount'] # average amount of that specific id. 
    else: 
     incomplete_df.loc[idx, 'amount'] = np.mean(means.amount) # average amount across all id's

什麼是最快和最Python的/ pandonic的方式來實現這一目標？

來源

2014-01-10 Rhubarb

在0.13您可以這樣做：http://pandas.pydata.org/pandas-docs/dev/missing_data.html#interpolation並查看缺失值部分：http://pandas.pydata.org/pandas-docs /dev/missing_data.html#cleaning-filling-missing-data – Jeff

@Jeff，謝謝。我擁有的不是時間序列。所以，從統計角度來看，我在尋找歸因而不是（間/額外）配置。我該如何處理給定ID的平均值本身是「NaN」的情況？ – Rhubarb

請參閱下面的@DSM解決方案作爲您的想法。但插值可以在幀的FYI上工作;時間序列是無關緊要的。他們有很多選擇。 – Jeff

聲明：我對真正最快的解決方案並不是很感興趣，但卻是最可愛的。

在這裏，我想這會是這樣的：

>>> df["amount"].fillna(df.groupby("id")["amount"].transform("mean"), inplace=True) 
>>> df["amount"].fillna(df["amount"].mean(), inplace=True)

產生

>>> df 
    id type amount 
0 1 one 345.0 
1 2 one 928.0 
2 3 two 942.0 
3 2 three 645.0 
4 2 two 113.0 
5 3 three 942.0 
6 1 one 442.0 
7 1 two 539.0 
8 1 one 442.0 
9 2 three 814.0 
10 4 one 615.2 

[11 rows x 3 columns]

有很多，這取決於究竟要如何鏈接歸集過程要明顯調整的。

來源

2014-01-10 17:29:34 DSM

謝謝帝斯曼，如果我有一個佔位符，例如0代替NaN，最快的方法是將所有0代替爲NaN，然後遵循你的解決方案？（我知道把0代替NaN是一個愚蠢的想法 - 唉，我必須與之合作） – Rhubarb

這可能是我想要做的。使用NaN來表示丟失的數據在熊貓中運行得相當深，所以執行某些事情的最簡單的本地方式通常需要讓數據與之對齊。我不知道最快 - 你可以用時間來比較。無論如何，爲NaN交換0將是線性和矢量化的，所以它不會在運行時增加很多。 – DSM

只是爲了確認，那些是兩種選擇，而不是完成單一操作所必需的？ –

熊貓：Impn NaN的

回答

相關問題