如何使用group by和返回空值的行

我有一個像下面的電子郵件和購買數據集。如何使用group by和返回空值的行

Email   Purchaser order_id amount 
[email protected] [email protected] 1   5 
[email protected]   
[email protected] [email protected] 2   10 
[email protected] [email protected] 3   5

我想查找數據集中的總人數，購買人數以及訂單總數和總收入金額。我知道如何通過SQL使用left join和聚合函數來做到這一點，但我不知道如何使用Python/pandas來複制它。

對於Python，我試圖這樣使用pandas和numpy：

table1 = table.groupby(['Email', 'Purchaser']).agg({'amount': np.sum, 'order_id': 'count'}) 

table1.agg({'Email': 'count', 'Purchaser': 'count', 'amount': np.sum, 'order_id': 'count'})

的問題是 - 它只是用命令（第1行第3日）返回行，但沒有其他的人（第2行）

Email   Purchaser  order_id amount 
[email protected] [email protected] 1   5 
[email protected] [email protected] 2   15

的SQL查詢應該是這樣的：

SELECT count(Email) as num_ind, count(Purchaser) as num_purchasers, sum(order) as orders , sum(amount) as revenue 
    FROM 
     (SELECT Email, Purchaser, count(order_id) as order, sum(amount) as amount 
     FROM table 1 
     GROUP BY Email, Purchaser) x

如何在Python中複製它？

來源

2015-12-28 Rabbit K

是購買者是「Na或NaN'？如果是的話，你可以使用'dropna（）'得到結果 – WoodChopper

歡迎來到StackOverflow - 你可以閱讀[tour]（http://stackoverflow.com/tour）。 – jezrael

它現在不在熊貓中實現 - see。

所以一個可怕的解決辦法是更換NaN一些字符串和agg後更換回NaN：

table['Purchaser'] = table['Purchaser'].replace(np.nan, 'dummy')

print table 
     Email Purchaser order_id amount 
0 [email protected] [email protected]   1  5 
1 [email protected]   NaN  NaN  NaN 
2 [email protected] [email protected]   2  10 
3 [email protected] [email protected]   3  5 

table['Purchaser'] = table['Purchaser'].replace(np.nan, 'dummy') 
print table 
     Email Purchaser order_id amount 
0 [email protected] [email protected]   1  5 
1 [email protected]  dummy  NaN  NaN 
2 [email protected] [email protected]   2  10 
3 [email protected] [email protected]   3  5 

table1 = table.groupby(['Email', 'Purchaser']).agg({'amount': np.sum, 'order_id': 'count'}) 
print table1 
         order_id amount 
Email  Purchaser      
[email protected] [email protected]   1  5 
[email protected] dummy    0  NaN 
[email protected] [email protected]   2  15 

table1 = table1.reset_index() 
table1['Purchaser'] = table1['Purchaser'].replace('dummy', np.nan) 
print table1 
     Email Purchaser order_id amount 
0 [email protected] [email protected]   1  5 
1 [email protected]   NaN   0  NaN 
2 [email protected] [email protected]   2  15

來源

2015-12-28 08:27:06 jezrael

非常感謝！解決方案完美運作 –

如何使用group by和返回空值的行

回答

相關問題