2016-12-12 73 views
0
library(sparklyr) 
library(dplyr) 
library(Lahman) 

spark_install(version = "2.0.0") 
sc <- spark_connect(master = "local") 

batting_tbl <- copy_to(sc, Lahman::Batting, "batting"); batting_tbl 

batting_tbl %>% arrange(-index()) 
# Error: org.apache.spark.sql.AnalysisException: Undefined function: 'INDEX'. 
# This function is neither a registered temporary 
# function nor a permanent function registered in the database 'default'.; line 3 pos 10 

任何人都知道如何使用dplyr通過索引與Spark(sparklyr)DataFrame排序?尋找排序火花數據幀索引使用SparklyR

回答

0

這是我能想出的最佳解決方案。儘管正確,sdf_with_unique_id函數返回62,000行以上的某些非常高的順序值。無論如何,這是使用SparklyR創建分佈式索引列的一種方法。

library(sparklyr) 
library(dplyr) 
library(Lahman) 

options(tibble.width = Inf) 
options(dplyr.print_max = Inf) 

spark_install(version = "2.0.0") 
sc <- spark_connect(master = "local") 

batting_tbl <- copy_to(sc, Lahman::Batting, "batting"); batting_tbl 
tbl_uncache(sc, "batting") 

y <- Lahman::Batting 

batting_tbl <- batting_tbl %>% sdf_with_unique_id(., id = "id") # Note 62300 threshold for higher values 
batting_tbl %>% arrange(-id)