2015-10-02 121 views
0

使用https://stackoverflow.com/a/32407543/5379015 提供的解決方案我試圖創建相同的查詢,但在Dataframe API代替使用編程語法如下:星火無法找到窗口功能

import org.apache.spark.{SparkContext, SparkConf} 
import org.apache.spark.sql.hive.HiveContext 
import org.apache.spark.sql.expressions.Window 
import org.apache.spark.sql.functions._ 

object HiveContextTest { 
    def main(args: Array[String]) { 
    val conf = new SparkConf().setAppName("HiveContextTest") 
    val sc = new SparkContext(conf) 
    val sqlContext = new HiveContext(sc) 
    import sqlContext.implicits._ 

    val df = sc.parallelize(
     ("foo", 1) :: ("foo", 2) :: ("bar", 1) :: ("bar", 2) :: Nil 
    ).toDF("k", "v") 


    // using dataframe api works fine 

    val w = Window.partitionBy($"k").orderBy($"v") 
    df.select($"k",$"v", rowNumber().over(w).alias("rn")).show 


    //using programmatic syntax doesn't work 

    df.registerTempTable("df") 
    val w2 = sqlContext.sql("select k,v,rowNumber() over (partition by k order by v) as rn from df") 
    w2.show() 

    } 
} 

第一df.select($"k",$"v", rowNumber().over(w).alias("rn")).show工作正常但w2.show()結果

Exception in thread "main" org.apache.spark.sql.AnalysisException: Couldn't find window function rowNumber; 

沒有人有任何想法,我怎樣才能使這項工作與編程語法?提前謝謝了。

回答

1

SQL相當於rowNumberrow_number

SELECT k, v, row_number() OVER (PARTITION BY k ORDER BY v) AS rn FROM df