2017-10-18 91 views
0

如何使用spark將數據從Oracle數據庫導入到dataframe或rdd,然後將此數據寫入一些配置單元表?使用Spark從Oracle到配置單元的Tranfser數據使用Spark

我有相同的代碼:

public static void main(String[] args) { 

    SparkConf conf = new SparkConf().setAppName("Data transfer test (Oracle -> Hive)").setMaster("local"); 
    JavaSparkContext sc = new JavaSparkContext(conf); 
    SQLContext sqlContext = new SQLContext(sc); 

    HashMap<String, String> options = new HashMap<>(); 
    options.put("url", "jdbc:oracle:thin:@<ip>:<port>:orcl"); 
    options.put("dbtable", "ACCOUNTS"); 
    options.put("user", "username"); 
    options.put("password", "12345"); 
    options.put("driver", "oracle.jdbc.OracleDriver"); 
    options.put("numPartitions", "4"); 

    DataFrame oracleDataFrame = sqlContext.read() 
       .format("jdbc") 
       .options(options) 
       .load(); 

} 

如果我創建HiveContext的情況下使用蜂巢

HiveContext hiveContext = new HiveContext(sc); 

我得到了同樣的錯誤:

ERROR conf.Configuration: Failed to set setXIncludeAware(true) for parser [email protected]:java.lang                      .UnsupportedOperationException: setXIncludeAware is not supported on this JAXP implementation or earlier: class oracle.xml.jaxp.JXDocumentBuilderFacto                      ry 
java.lang.UnsupportedOperationException: setXIncludeAware is not supported on this JAXP implementation or earlier: class oracle.xml.jaxp.JXDocumentBui                      lderFactory 
     at javax.xml.parsers.DocumentBuilderFactory.setXIncludeAware(DocumentBuilderFactory.java:614) 
     at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2534) 
     at org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2503) 
     at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2409) 
     at org.apache.hadoop.conf.Configuration.set(Configuration.java:1144) 
     at org.apache.hadoop.conf.Configuration.set(Configuration.java:1116) 
     at org.apache.hadoop.mapred.JobConf.setJar(JobConf.java:525) 
     at org.apache.hadoop.mapred.JobConf.setJarByClass(JobConf.java:543) 
     at org.apache.hadoop.mapred.JobConf.<init>(JobConf.java:437) 
     at org.apache.hadoop.hive.conf.HiveConf.initialize(HiveConf.java:2750) 
     at org.apache.hadoop.hive.conf.HiveConf.<init>(HiveConf.java:2713) 
     at org.apache.spark.sql.hive.client.ClientWrapper.<init>(ClientWrapper.scala:185) 
     at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) 
     at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) 
     at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) 
     at java.lang.reflect.Constructor.newInstance(Constructor.java:526) 
     at org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:249) 
     at org.apache.spark.sql.hive.HiveContext.metadataHive$lzycompute(HiveContext.scala:329) 
     at org.apache.spark.sql.hive.HiveContext.metadataHive(HiveContext.scala:239) 
     at org.apache.spark.sql.hive.HiveContext.setConf(HiveContext.scala:443) 
     at org.apache.spark.sql.SQLContext$$anonfun$4.apply(SQLContext.scala:272) 
     at org.apache.spark.sql.SQLContext$$anonfun$4.apply(SQLContext.scala:271) 
     at scala.collection.Iterator$class.foreach(Iterator.scala:727) 
     at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) 
     at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) 
     at scala.collection.AbstractIterable.foreach(Iterable.scala:54) 
     at org.apache.spark.sql.SQLContext.<init>(SQLContext.scala:271) 
     at org.apache.spark.sql.hive.HiveContext.<init>(HiveContext.scala:90) 
     at org.apache.spark.sql.hive.HiveContext.<init>(HiveContext.scala:101) 
     at org.apache.spark.sql.hive.HiveContext.<init>(HiveContext.scala:103) 
     at replicator.ImportFromOracleToHive.init(ImportFromOracleToHive.java:52) 
     at replicator.ImportFromOracleToHive.main(ImportFromOracleToHive.java:76) 
     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
     at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) 
     at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 
     at java.lang.reflect.Method.invoke(Method.java:606) 
     at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:730) 
     at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181) 
     at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206) 
     at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121) 
     at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) 

回答

0

問題將似乎是過時的Xerces依賴項的問題,如detailed in this question。我猜想你已經以某種方式在過渡性方面拉動了這一點,但是如果沒有看到你的pom.xml就不可能分辨出來。您會從堆棧跟蹤中注意到您發佈的錯誤來自Hadoop-Common Configuration對象,而不是Spark本身。解決方案是確保您使用的是足夠新的版本。

<dependency> 
    <groupId>xerces</groupId> 
    <artifactId>xercesImpl</artifactId> 
    <version>2.11.0</version> 
</dependency> 
相關問題