2016-11-06 56 views
5

如何計算單個組的總數和計數?主要基於以下數據框

val client = Seq((1,"A",10),(2,"A",5),(3,"B",56)).toDF("ID","Categ","Amnt") 
+---+-----+----+ 
| ID|Categ|Amnt| 
+---+-----+----+ 
| 1| A| 10| 
| 2| A| 5| 
| 3| B| 56| 
+---+-----+----+ 

我想獲取ID的數量和類別的總金額:

+-----+-----+---------+ 
|Categ|count|sum(Amnt)| 
+-----+-----+---------+ 
| B| 1|  56| 
| A| 2|  15| 
+-----+-----+---------+ 

是否有可能做的次數和金額,而不必須加入?

client.groupBy("Categ").count 
     .join(client.withColumnRenamed("Categ","cat") 
      .groupBy("cat") 
      .sum("Amnt"), 'Categ === 'cat) 
     .drop("cat") 

也許這樣的事情:

client.createOrReplaceTempView("client") 
spark.sql("SELECT Categ count(Categ) sum(Amnt) FROM client GROUP BY Categ").show() 

回答

6

我給不同的例子比你

multiple group functions are possible like this. try it accordingly

// In 1.3.x, in order for the grouping column "department" to show up, 
// it must be included explicitly as part of the agg function call. 
df.groupBy("department").agg($"department", max("age"), sum("expense")) 

// In 1.4+, grouping column "department" is included automatically. 
df.groupBy("department").agg(max("age"), sum("expense")) 
4

你可以做喜歡聚集在下面給出的表:

client.groupBy("Categ").agg(sum("Amnt"),count("ID")).show() 

+-----+---------+---------+ 
|Categ|sum(Amnt)|count(ID)| 
+-----+---------+---------+ 
| A|  15|  2| 
| B|  56|  1| 
+-----+---------+---------+