0
val conf = new SparkConf()
.setMaster("local[1]")
.setAppName("Small")
.set("spark.executor.memory", "2g")
val sc = new SparkContext(conf)
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
import sqlContext.implicits._
val df = sc.parallelize(Array((1,30),(2,10),(3,20),(1,10)(2,30))).toDF("books","readers")
val results = df.join(
df.select($"books" as "r_books", $"readers" as "r_readers"),
$"readers" === $"r_readers" and $"books" < $"r_books"
)
.groupBy($"books", $"r_books")
.agg($"books", $"r_books", count($"readers"))
在SBT控制檯開始與以下build.sbt:
name := "Small"
version := "1.0"
scalaVersion := "2.10.4"
libraryDependencies += "org.apache.spark" %% "spark-core" % "1.3.1"
libraryDependencies += "org.apache.spark" %% "spark-sql" % "1.3.1"
返回錯誤:
scala.reflect.internal.MissingRequirementError:類org.apache.spark.sql.catalyst.ScalaReflection在JavaMirror與java.net.URLClass Loader @ 13a9a4f9 ...
任何想法?
感謝,所以'count'的錯誤用法是'scala.reflect.internal.MissingRequirementError的原因:類org.apache.spark.sql.catalyst.ScalaReflection'錯誤?是嗎? – zork
好吧,這個cou,ld應該是因爲它似乎是這樣說的,但它也可能是因爲你沒有數組,因爲你留下了一個逗號,我只是重寫你的代碼,Idea不運行,如果它工作的話,請正確標記 – anquegi
在你的代碼中'results'是一個'long'數字,不是我所需要的。我需要得到一個數據幀,其中每個記錄是'book1,book2,cnt',cnt是book1和book2一起讀取的次數。 – zork