我正在使用Spark SQL和DataFrames。有沒有辦法用一些算術做一個select語句,just as you can in SQL?Spark SQL:使用列值計算並選擇類型轉換?
例如,我在下面的表有:
var data = Array((1, "foo", 30, 5), (2, "bar", 35, 3), (3, "foo", 25, 4))
var dataDf = sc.parallelize(data).toDF("id", "name", "value", "years")
dataDf.printSchema
// root
// |-- id: integer (nullable = false)
// |-- name: string (nullable = true)
// |-- value: integer (nullable = false)
// |-- years: integer (nullable = false)
dataDf.show()
// +---+----+-----+-----+
// | id|name|value|years|
// +---+----+-----+-----+
// | 1| foo| 30| 5|
// | 2| bar| 35| 3|
// | 3| foo| 25| 4|
//+---+----+-----+-----+
現在,我想這樣做,創建與現有的列進行一些運算的新列的SELECT語句。例如,我想計算比率value/years
。我需要將價值(或年)轉換爲第一。我想這種說法,但它不會解析:
dataDf.
select(dataDf("name"), (dataDf("value").toDouble/dataDf("years")).as("ratio")).
show()
<console>:35: error: value toDouble is not a member of org.apache.spark.sql.Column
select(dataDf("name"), (dataDf("value").toDouble/dataDf("years")).as("ratio")).
我看到「How to change column types in Spark SQL's DataFrame?」類似的問題,但是這並不完全我想要的。
感謝。這種鑄造作品的方法。 「dataDf(」name「),(dataDf(」value「)。cast(」double「)/ dataDf(」years「))。as(」ratio「))。show' – stackoverflowuser2010