2017-02-01 57 views
3

我對R和sqldf非常陌生,似乎無法解決一個基本問題。我有一個交易文件,其中每行代表一個購買的產品。選擇單筆訂單所花費的最高金額

文件看起來是這樣的:

customer_id,order_number,order_date, amount, product_name 
1, 202, 21/04/2015, 58, "xlfd" 
1, 275, 16//08/2015, 74, "ghb" 
1, 275, 16//08/2015, 36, "fjk" 
2, 987, 12/03/2015, 27, "xlgm" 
3, 376, 16/05/2015, 98, "fgt" 
3, 368, 30/07/2015, 46, "ade" 

我需要找到每個customer_id在一個單一的交易(同order_number)花費的最高金額。例如在customer_id "1"的情況下,它將是(74+36)=110

回答

4

假設數據框被命名爲orders,以下將做的工作:使用嵌套查詢下面將得到所需要的輸出:

sqldf("select customer_id, max(total) 
     from (select customer_id, order_number, sum(amount) as total 
      from orders 
      group by customer_id, order_number) 
     group by customer_id") 

輸出:

sqldf("select customer_id, order_number, sum(amount) 
     from orders 
     group by customer_id, order_number") 

更新

customer_id max(total) 1 1 110 2 2 27 3 3 98 
+2

這將返回每次購買每用戶花費的總金額,而所需的輸出似乎只是在單次購買金額最高出去用戶所有購買的。也許採取這個輸出,並提取'customer_id,max(sum(amount))''group by customer_id'? – Aramis7d

+0

@Elena Berrone,請接受答案,請參閱[當某人回答我的問題時該怎麼辦?](http://stackoverflow.com/help/someone-answers) –

4

如果sqldf不是一個嚴格的要求。

考慮您的輸入作爲dft,你可以嘗試:

require(dplyr) 
require(magrittr) 
dft %>% 
    group_by(customer_id, order_number) %>% 
    summarise(amt = sum(amount)) %>% 
    group_by(customer_id) %>% 
    summarise(max_amt = max(amt)) 

這給:

Source: local data frame [3 x 2] 
Groups: customer_id [3] 

    customer_id max_amt 
     <int> <int> 
1   1  110 
2   2  27 
3   3  98 
1

我們也可以使用data.table。將'data.frame'轉換爲'data.table'(setDT(df1)),按'customer_id','order_number'分組,我們得到'amount'的sum,用'customer_id'做第二個組,獲得max of 'Sumamount'

library(data.table) 
setDT(df1)[, .(Sumamount = sum(amount)) , .(customer_id, order_number) 
     ][,.(MaxAmount = max(Sumamount)) , customer_id] 
# customer_id MaxAmount 
#1:   1  110 
#2:   2  27 
#3:   3  98 

或者使它更加緊湊,由 'CUSTOMER_ID' 分組後,我們split將通過 'ORDER_NUMBER',遍歷list '量',得到sum,找到max到得到'MaxAmount'

setDT(df1)[, .(MaxAmount = max(unlist(lapply(split(amount, 
         order_number), sum)))), customer_id] 
# customer_id MaxAmount 
#1:   1  110 
#2:   2  27 
#3:   3  98 

或者使用base R

aggregate(amount~customer_id, aggregate(amount~customer_id+order_number, 
         df1, sum), FUN = max) 
相關問題