設config.json
是一個小的JSON文件:星火崩潰而當與AWS-Java的SDK鏈接閱讀JSON文件
{
"toto": 1
}
我做了一個簡單的代碼與sc.textFile
讀取JSON文件(由於文件可在S3上,本地或HDFS,所以文本文件方便)
import org.apache.spark.{SparkContext, SparkConf}
object testAwsSdk {
def main(args:Array[String]):Unit = {
val sparkConf = new SparkConf().setAppName("test-aws-sdk").setMaster("local[*]")
val sc = new SparkContext(sparkConf)
val json = sc.textFile("config.json")
println(json.collect().mkString("\n"))
}
}
的SBT文件僅拉出spark-core
庫
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % "1.5.1" % "compile"
)
程序按預期工作,在標準輸出中寫入config.json的內容。
現在我想鏈接到aws-java-sdk,亞馬遜的sdk來訪問S3。
libraryDependencies ++= Seq(
"com.amazonaws" % "aws-java-sdk" % "1.10.30" % "compile",
"org.apache.spark" %% "spark-core" % "1.5.1" % "compile"
)
執行相同的代碼,spark會引發以下異常。
Exception in thread "main" com.fasterxml.jackson.databind.JsonMappingException: Could not find creator property with name 'id' (in class org.apache.spark.rdd.RDDOperationScope)
at [Source: {"id":"0","name":"textFile"}; line: 1, column: 1]
at com.fasterxml.jackson.databind.JsonMappingException.from(JsonMappingException.java:148)
at com.fasterxml.jackson.databind.DeserializationContext.mappingException(DeserializationContext.java:843)
at com.fasterxml.jackson.databind.deser.BeanDeserializerFactory.addBeanProps(BeanDeserializerFactory.java:533)
at com.fasterxml.jackson.databind.deser.BeanDeserializerFactory.buildBeanDeserializer(BeanDeserializerFactory.java:220)
at com.fasterxml.jackson.databind.deser.BeanDeserializerFactory.createBeanDeserializer(BeanDeserializerFactory.java:143)
at com.fasterxml.jackson.databind.deser.DeserializerCache._createDeserializer2(DeserializerCache.java:409)
at com.fasterxml.jackson.databind.deser.DeserializerCache._createDeserializer(DeserializerCache.java:358)
at com.fasterxml.jackson.databind.deser.DeserializerCache._createAndCache2(DeserializerCache.java:265)
at com.fasterxml.jackson.databind.deser.DeserializerCache._createAndCacheValueDeserializer(DeserializerCache.java:245)
at com.fasterxml.jackson.databind.deser.DeserializerCache.findValueDeserializer(DeserializerCache.java:143)
at com.fasterxml.jackson.databind.DeserializationContext.findRootValueDeserializer(DeserializationContext.java:439)
at com.fasterxml.jackson.databind.ObjectMapper._findRootDeserializer(ObjectMapper.java:3666)
at com.fasterxml.jackson.databind.ObjectMapper._readMapAndClose(ObjectMapper.java:3558)
at com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:2578)
at org.apache.spark.rdd.RDDOperationScope$.fromJson(RDDOperationScope.scala:82)
at org.apache.spark.rdd.RDDOperationScope$$anonfun$5.apply(RDDOperationScope.scala:133)
at org.apache.spark.rdd.RDDOperationScope$$anonfun$5.apply(RDDOperationScope.scala:133)
at scala.Option.map(Option.scala:145)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:133)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108)
at org.apache.spark.SparkContext.withScope(SparkContext.scala:709)
at org.apache.spark.SparkContext.hadoopFile(SparkContext.scala:1012)
at org.apache.spark.SparkContext$$anonfun$textFile$1.apply(SparkContext.scala:827)
at org.apache.spark.SparkContext$$anonfun$textFile$1.apply(SparkContext.scala:825)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108)
at org.apache.spark.SparkContext.withScope(SparkContext.scala:709)
at org.apache.spark.SparkContext.textFile(SparkContext.scala:825)
at testAwsSdk$.main(testAwsSdk.scala:11)
at testAwsSdk.main(testAwsSdk.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:140)
閱讀棧,似乎當AWS-Java的SDK鏈接,sc.textFile
檢測到該文件是一個JSON文件,並嘗試與傑克遜承擔一定的格式,它無法找到,當然對它進行解析。我需要鏈接aws-java-sdk,所以我的問題是:
1-爲什麼要添加aws-java-sdk
修改spark-core
的行爲?
2-是否有解決方法(該文件可以在HDFS,S3或本地)?
這是因爲aws-java-sdk正在使用jackson庫的最新版本2.5.3,並且spark使用的是舊版本2.4.4。我面臨同樣的問題,但無法解決它。如果你找到了解決方案,請分享。謝謝 –
嗨Hafiz ...漂亮的anoying不是嗎?我將案件發送給AWS。他們確認這是一個兼容性問題。儘管他們沒有告訴我一個明確的解決方案。將嘗試儘快整理出來。 – Boris
嗨鮑里斯!是的,這是討厭面對這個問題,但我已經通過從spark-core中排除jackson核心和jackson模塊庫並添加了jackson核心最新庫依賴項來解決它 –