2017-04-18 112 views
1

我有一個樣品數據幀斯卡拉/火花乘以在數據幀列

df_that_I_have 
+---------+---------+-------+ 
| country | members | some | 
+---------+---------+-------+ 
| India | 50  | 1  | 
+---------+---------+-------+ 
| Japan | 20  | 3  | 
+---------+---------+-------+ 
| India | 20  | 1  | 
+---------+---------+-------+ 
| Japan | 10  | 3  | 
+---------+---------+-------+ 

,我想數據幀一個看起來像這樣

df_that_I_want 
+---------+---------+-------+ 
| country | members | some | 
+---------+---------+-------+ 
| India | 70  | 10 | // 5 * Sum of "some" for India, i.e. (1 + 1) 
+---------+---------+-------+ 
| Japan | 30  | 30 | // 5 * Sum of "some" for Japan, i.e. (3 + 3) 
+---------+---------+-------+ 

第二數據幀具有各值的整數總和memberssome的總和乘以5.

這就是我正在做的,以達到這個目的

val df_that_I_want = df_that_I_have 
         .select(df_that_I_have("country"), 
           df_that_I_have.groupBy("country").sum("members"), 
           5 * df_that_I_have.groupBy("country").sum("some")) //Problem here 

但編譯器不允許我這樣做,因爲顯然我不能乘以5列。

如何將整數值與各國的some之和相乘?

回答

2

你可以試試lit功能。

scala> val df_that_I_have = Seq(("India",50,1),("India",20,1),("Japan",20,3),("Japan",10,3)).toDF("Country","Members","Some") 
df_that_I_have: org.apache.spark.sql.DataFrame = [Country: string, Members: int, Some: int] 

scala> val df1 = df_that_I_have.groupBy("country").agg(sum("members"), sum("some") * lit(5)) 
df1: org.apache.spark.sql.DataFrame = [country: string, sum(members): bigint, ((sum(some),mode=Complete,isDistinct=false) * 5): bigint] 

scala> val df_that_I_want= df1.select($"Country",$"sum(Members)".alias("Members"), $"((sum(Some),mode=Complete,isDistinct=false) * 5)".alias("Some")) 
df_that_I_want: org.apache.spark.sql.DataFrame = [Country: string, Members: bigint, Some: bigint] 

scala> df_that_I_want.show 

+-------+-------+----+ 
|Country|Members|Some| 
+-------+-------+----+ 
| India|  70| 10| 
| Japan|  30| 30| 
+-------+-------+----+ 
1

請試試這個

df_that_I_have.select("country").groupBy("country").agg(sum("members"), sum("some") * lit(5))