如何使用適用於找到每個組中最大SparkR

我有以下星火據幀：如何使用適用於找到每個組中最大SparkR

agent_product_sale=data.frame(agent=c('a','b','c','d','e','f','a','b','c','a','b'), 
         product=c('P1','P2','P3','P4','P1','p1','p2','p2','P2','P3','P3'), 
         sale_amount=c(1000,2000,3000,4000,1000,1000,2000,2000,2000,3000,3000)) 

RDD_aps=createDataFrame(sqlContext,agent_product_sale) 

    agent product sale_amount 
1  a  P1  1000 
2  b  P1  1000 
3  c  P3  3000 
4  d  P4  4000 
5  d  P1  1000 
6  c  P1  1000 
7  a  P2  2000 
8  b  P2  2000 
9  c  P2  2000 
10  a  P4  4000 
11  b  P3  3000

我需要組星火數據幀由代理人併爲每個代理找到最高sale_amount

產品

 agent most_expensive 
     a   P4   
     b   P3     
     c   P3   
     d   P4

我用下面的代碼，但它會返回最大sale_amount每個代理

schema <- structType(structField("agent", "string"), 
structField("max_sale_amount", "double")) 

result <- gapply(
RDD_aps, 
c("agent"), 
function(key, x) { 
y <- data.frame(key,max(x$sale_amount), stringsAsFactors = FALSE) 
}, schema)

來源

2016-09-06 sanaz

嘗試用'which.max' – akrun

或者可以是'的gD < - AGG（GROUPBY（RDD_aps，RDD_aps $劑）; AGG（排列（GD，遞減（GD $ sale_amount）），most_expensive =第一（gD $ product））'（未測試） – akrun

我可能是錯的，但是你可以在'arrange'之後再次調用'groupBy'＃ – akrun

與tapply（）或聚集（），您可以一組

agent_product_sale=data.frame(agent=c('a','b','c','d','e','f','a','b','c','a','b'), 
     +        product=c('P1','P2','P3','P4','P1','p1','p2','p2','P2','P3','P3'), 
     +        sale_amount=c(1000,2000,3000,4000,1000,1000,2000,2000,2000,3000,3000)) 


tapply(agent_product_sale$sale_amount,agent_product_sale$agent, max) 
       a b c d e f 
      3000 3000 3000 4000 1000 1000 



aggregate(agent_product_sale$sale_amount,by=list(agent_product_sale$agent), max) 
      Group.1 x 
     1  a 3000 
     2  b 3000 
     3  c 3000 
     4  d 4000 
     5  e 1000 
     6  f 1000

骨料返回data.frame中發現的最大值和typply一個數組，你的，你喜歡什麼，繼續與工作結果。

來源

2016-09-06 06:49:04

ar1 <- arrange(RDD_aps,desc(RDD_aps$sale_amount)) 
collect(summarize(groupBy(ar1,ar1‌$agent),most_expensi‌ve=first(ar1$product‌)))

來源

2016-09-06 07:29:50 sanaz

如何使用適用於找到每個組中最大SparkR

回答

相關問題