2016-06-28 78 views
0

我想讀取一個示例json文件到SqlContext使用下面的代碼,但它失敗,隨後的數據源錯誤。閱讀json的火花缺少json datasource

val sqlContext = new org.apache.spark.sql.SQLContext(sc) 
val path = "C:\\samplepath\\sample.json" 
val jsondata = sqlContext.read.json(path) 

拋出java.lang.ClassNotFoundException:無法找到數據源:JSON。 請http://spark-packages.org 發現包在org.apache.spark.sql.execution.datasources.ResolvedDataSource $ .lookupDataSource(ResolvedDataSource.scala:77) 在org.apache.spark.sql.execution.datasources.ResolvedDataSource $。適用( ResolvedDataSource.scala:102) 在org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:119) 在org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:109) 在有機apache.spark.sql.DataFrameReader.json(DataFrameReader.scala:244) at org.apache.spark.deploy.SparkSubmit $ .doRunMain $ 1(SparkSubmit.scala:181) at org.apache.spark.deploy.SparkSubmit $ .submit(SparkSubmit.scala:206) at org.apache.spark.deploy.SparkSubmit $ .main(SparkSubmit.scala:121 ) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)導致:java.lang.ClassNotFoundException:json.DefaultSource at scala.tools.nsc.interpreter.AbstractFileClassLoader.findClass(AbstractFileClassLoader.scala:83 ) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at org.apache.spark.sql.execution.datasources.ResolvedDataSource $$ anonfun $ 4 $$ anonfun $ apply $ 1.apply(ResolvedDataSource.scala:62) at org.apache.spark.sql.execution.datasources.ResolvedDataSource $$ anonfun $ 4 $$ anonfun $ apply $ 1.apply(ResolvedDataSource.scala:62 ) at scala.util.Try $ .apply(Try.scala:161) at org.apache.spark.sql.execution.datasources.ResolvedDataSource $$ anonfun $ 4.apply(Resolved DataSource.scala:62) at org.apache.spark.sql.execution.datasources.ResolvedDataSource $$ anonfun $ 4.apply(ResolvedDataSource.scala:62) at scala.util.Try.orElse(Try.scala:82) 在org.apache.spark.sql.execution.datasources.ResolvedDataSource $ .lookupDataSource(ResolvedDataSource.scala:62) ...... 50多個

我試圖尋找一個火花包可能會丟失,但找不到任何有助於解決問題的方法。

我試過類似的代碼使用Pyspark,但它失敗了一個類似的json數據源ClassNotFoundException。

在進一步嘗試將現有的RDD轉換爲JsonRDD後,我能夠成功獲取結果。有什麼我失蹤?我在Scala-2.10.5上使用Spark-1.6.1。任何幫助表示讚賞。由於

val stringRDD = sc.parallelize(Seq(""" 
    { "isActive": false, 
    "balance": "$1,431.73", 
    "picture": "http://placehold.it/32x32", 
    "age": 35, 
    "eyeColor": "blue" 
    }""", 
    """{ 
    "isActive": true, 
    "balance": "$2,515.60", 
    "picture": "http://placehold.it/32x32", 
    "age": 34, 
    "eyeColor": "blue" 
    }""", 
    """{ 
    "isActive": false, 
    "balance": "$3,765.29", 
    "picture": "http://placehold.it/32x32", 
    "age": 26, 
    "eyeColor": "blue" 
    }""") 
) 
sqlContext.jsonRDD(stringRDD).registerTempTable("testjson") 
sqlContext.sql("SELECT age from testjson").collect 

回答

0

我創造了使用源代碼的jar,因此我相信這個問題是一些缺少資源。我從火花網站&下載了最新的jar,它按預期工作。