2017-06-23 130 views
3

我有這樣的數據幀作爲一個例子缺失值的Python熊貓填充數據幀

import pandas as pd 

#create dataframe 
df = pd.DataFrame([['DE', 'Table',201705,201705, 1000], ['DE', 'Table',201705,201704, 1000],\ 
        ['DE', 'Table',201705,201702, 1000], ['DE', 'Table',201705,201701, 1000],\ 
        ['AT', 'Table',201708,201708, 1000], ['AT', 'Table',201708,201706, 1000],\ 
        ['AT', 'Table',201708,201705, 1000], ['AT', 'Table',201708,201704, 1000]],\ 
        columns=['ISO','Product','Billed Week', 'Created Week', 'Billings']) 
print (df) 

    ISO Product Billed Week Created Week Billings 
0 DE Table  201705  201705  1000 
1 DE Table  201705  201704  1000 
2 DE Table  201705  201702  1000 
3 DE Table  201705  201701  1000 
4 AT Table  201708  201708  1000 
5 AT Table  201708  201706  1000 
6 AT Table  201708  201705  1000 
7 AT Table  201708  201704  1000 

我需要做的就是填寫一些缺失數據與0比林斯每個GROUPBY [「ISO」,「產品」什麼]在序列中出現中斷的情況下,即在某個星期內沒有創建帳單,因此缺少帳單。它需要基於最大「記賬週數」和「最小創建週數」。即,那就是應該完成的組合,並且沒有按順序中斷。

因此,對於上述情況,丟失記錄,我需要以編程方式添加到數據庫中顯示如下:

ISO Product Billed Week Created Week Billings 
0 DE Table  201705  201703   0 
1 AT Table  201708  201707   0 

回答

2
def seqfix(x): 
    s = x['Created Week'] 
    x = x.set_index('Created Week') 
    x = x.reindex(range(min(s), max(s)+1)) 
    x['Billings'] = x['Billings'].fillna(0) 
    x = x.ffill().reset_index() 
    return x 

df = df.groupby(['ISO', 'Billed Week']).apply(seqfix).reset_index(drop=True) 
df[['Billed Week', 'Billings']] = df[['Billed Week', 'Billings']].astype(int) 
df = df[['ISO', 'Product', 'Billed Week', 'Created Week', 'Billings']] 

print(df) 

    ISO Product Billed Week Created Week Billings 
0 AT Table  201708  201704  1000 
1 AT Table  201708  201705  1000 
2 AT Table  201708  201706  1000 
3 AT Table  201708  201707   0 
4 AT Table  201708  201708  1000 
5 DE Table  201705  201701  1000 
6 DE Table  201705  201702  1000 
7 DE Table  201705  201703   0 
8 DE Table  201705  201704  1000 
9 DE Table  201705  201705  1000 
+0

我無意中編輯了您的答案,但我已將其恢復。對於那個很抱歉。 – Allen

+0

@Allen完全沒有問題! – su79eu7k

3

這是我solution.I相信有些天才會提供更好的解決方案〜讓我們等待它〜

df1=df.groupby('ISO').agg({'Billed Week' : np.max,'Created Week' : np.min}) 
df1['ISO']=df1.index 

    Created Week Billed Week ISO 
ISO        
AT   201704  201708 AT 
DE   201701  201705 DE 

ISO=[] 
BilledWeek=[] 
CreateWeek=[] 
for i in range(len(df1)): 
    BilledWeek.extend([df1.ix[i,1]]*(df1.ix[i,1]-df1.ix[i,0]+1)) 
    CreateWeek.extend(list(range(df1.ix[i,0],df1.ix[i,1]+1))) 
    ISO.extend([df1.ix[i,2]]*(df1.ix[i,1]-df1.ix[i,0]+1)) 
DF=pd.DataFrame({'BilledWeek':BilledWeek,'CreateWeek':CreateWeek,'ISO':ISO}) 
Target=DF.merge(df,left_on=['BilledWeek','CreateWeek','ISO'],right_on=['Billed Week','Created Week','ISO'],how='left') 
Target.Billings.fillna(0,inplace=True) 
Target=Target.drop(['Billed Week', 'Created Week'],axis=1) 
Target['Product']=Target.groupby('ISO')['Product'].ffill() 

Out[75]: 
    BilledWeek CreateWeek ISO Product Billings 
0  201708  201704 AT Table 1000.0 
1  201708  201705 AT Table 1000.0 
2  201708  201706 AT Table 1000.0 
3  201708  201707 AT Table  0.0 
4  201708  201708 AT Table 1000.0 
5  201705  201701 DE Table 1000.0 
6  201705  201702 DE Table 1000.0 
7  201705  201703 DE Table  0.0 
8  201705  201704 DE Table 1000.0 
9  201705  201705 DE Table 1000.0 
+0

感謝那個@Wen,它看起來比我想的黑客高約4倍。我將不得不在稍後檢查它,並且會讓我知道如果在我的大型數據框和此測試數據框上滿足我的需求。 – IcemanBerlin

+0

好的,如果代碼不適用於您,請讓我知道〜 – Wen

2

構建MultiIndex,填充Created Weeks中的所有間隙,然後重新索引原始DF。

idx = (df.groupby(['Billed Week']) 
     .apply(lambda x: [(x['ISO'].min(), 
          x['Product'].min(), 
          x['Billed Week'].min(), 
          e) for e in range(x['Created Week'].min(), x['Created Week'].max()+1)]) 
     .tolist() 
) 

multi_idx = pd.MultiIndex.from_tuples(sum(idx,[]),names=['ISO','Product','Billed Week','Created Week']) 

(df.set_index(['ISO','Product','Billed Week','Created Week']) 
    .reindex(multi_idx) 
    .reset_index() 
    .fillna(0) 
) 

Out[671]: 
    ISO Product Billed Week Created Week Billings 
0 DE Table  201705  201701 1000.0 
1 DE Table  201705  201702 1000.0 
2 DE Table  201705  201703  0.0 
3 DE Table  201705  201704 1000.0 
4 DE Table  201705  201705 1000.0 
5 AT Table  201708  201704 1000.0 
6 AT Table  201708  201705 1000.0 
7 AT Table  201708  201706 1000.0 
8 AT Table  201708  201707  0.0 
9 AT Table  201708  201708 1000.0 
相關問題