2015-12-02 71 views
0

我使用Cloudera的快速啓動VM CDH5.3.0(在包裹方面束)和Spark 1.2.0 $SPARK_HOME=/opt/cloudera/parcels/CDH-5.3.0-1.cdh5.3.0.p0.30/lib/spark並提交星火應用程序中使用的命令java.io.IOException異常:沒有文件系統的方案:HDFS

./bin/spark-submit --class <Spark_App_Main_Class_Name> --master spark://localhost.localdomain:7077 --deploy-mode client --executor-memory 4G ../apps/<Spark_App_Target_Jar_Name>.jar

Spark_App_Main_Class_Name.scala

import org.apache.spark.SparkContext 
import org.apache.spark.SparkConf 
import org.apache.spark.mllib.util.MLUtils 


object Spark_App_Main_Class_Name { 

    def main(args: Array[String]) { 
     val hConf = new SparkConf() 
      .set("fs.hdfs.impl", classOf[org.apache.hadoop.hdfs.DistributedFileSystem].getName) 
      .set("fs.file.impl", classOf[org.apache.hadoop.fs.LocalFileSystem].getName) 
     val sc = new SparkContext(hConf) 
     val data = MLUtils.loadLibSVMFile(sc, "hdfs://localhost.localdomain:8020/analytics/data/mllib/sample_libsvm_data.txt") 
     ... 
    } 

} 

但我收到ClassNotFoundExceptionorg.apache.hadoop.hdfs.DistributedFileSystem而火花SUBM itting在客戶端模式

[[email protected] bin]$ ./spark-submit --class Spark_App_Main_Class_Name --master spark://localhost.localdomain:7077 --deploy-mode client --executor-memory 4G ../apps/Spark_App_Target_Jar_Name.jar 
15/11/30 09:46:34 INFO SparkContext: Spark configuration: 
spark.app.name=Spark_App_Main_Class_Name 
spark.driver.extraLibraryPath=/opt/cloudera/parcels/CDH-5.3.0-1.cdh5.3.0.p0.30/lib/hadoop/lib/native 
spark.eventLog.dir=hdfs://localhost.localdomain:8020/user/spark/applicationHistory 
spark.eventLog.enabled=true 
spark.executor.extraLibraryPath=/opt/cloudera/parcels/CDH-5.3.0-1.cdh5.3.0.p0.30/lib/hadoop/lib/native 
spark.executor.memory=4G 
spark.jars=file:/opt/cloudera/parcels/CDH-5.3.0-1.cdh5.3.0.p0.30/lib/spark/bin/../apps/Spark_App_Target_Jar_Name.jar 
spark.logConf=true 
spark.master=spark://localhost.localdomain:7077 
spark.yarn.historyServer.address=http://localhost.localdomain:18088 
15/11/30 09:46:34 WARN Utils: Your hostname, localhost.localdomain resolves to a loopback address: 127.0.0.1; using 10.113.234.150 instead (on interface eth12) 
15/11/30 09:46:34 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address 
15/11/30 09:46:34 INFO SecurityManager: Changing view acls to: cloudera 
15/11/30 09:46:34 INFO SecurityManager: Changing modify acls to: cloudera 
15/11/30 09:46:34 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(cloudera); users with modify permissions: Set(cloudera) 
15/11/30 09:46:35 INFO Slf4jLogger: Slf4jLogger started 
15/11/30 09:46:35 INFO Remoting: Starting remoting 
15/11/30 09:46:35 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://[email protected]:59473] 
15/11/30 09:46:35 INFO Remoting: Remoting now listens on addresses: [akka.tcp://[email protected]:59473] 
15/11/30 09:46:35 INFO Utils: Successfully started service 'sparkDriver' on port 59473. 
15/11/30 09:46:36 INFO SparkEnv: Registering MapOutputTracker 
15/11/30 09:46:36 INFO SparkEnv: Registering BlockManagerMaster 
15/11/30 09:46:36 INFO DiskBlockManager: Created local directory at /tmp/spark-local-20151130094636-8c3d 
15/11/30 09:46:36 INFO MemoryStore: MemoryStore started with capacity 267.3 MB 
15/11/30 09:46:38 INFO HttpFileServer: HTTP File server directory is /tmp/spark-7d1f2861-a568-4919-8f7e-9a9fe6aab2b4 
15/11/30 09:46:38 INFO HttpServer: Starting HTTP Server 
15/11/30 09:46:38 INFO Utils: Successfully started service 'HTTP file server' on port 50003. 
15/11/30 09:46:38 INFO Utils: Successfully started service 'SparkUI' on port 4040. 
15/11/30 09:46:38 INFO SparkUI: Started SparkUI at http://10.113.234.150:4040 
15/11/30 09:46:39 INFO SparkContext: Added JAR file:/opt/cloudera/parcels/CDH-5.3.0-1.cdh5.3.0.p0.30/lib/spark/bin/../apps/Spark_App_Target_Jar_Name.jar at http://10.113.234.150:50003/jars/Spark_App_Target_Jar_Name.jar with timestamp 1448894799228 
15/11/30 09:46:39 INFO AppClient$ClientActor: Connecting to master spark://localhost.localdomain:7077... 
15/11/30 09:46:40 INFO SparkDeploySchedulerBackend: Connected to Spark cluster with app ID app-20151130094640-0000 
15/11/30 09:46:41 INFO NettyBlockTransferService: Server created on 56458 
15/11/30 09:46:41 INFO BlockManagerMaster: Trying to register BlockManager 
15/11/30 09:46:41 INFO BlockManagerMasterActor: Registering block manager 10.113.234.150:56458 with 267.3 MB RAM, BlockManagerId(<driver>, 10.113.234.150, 56458) 
15/11/30 09:46:41 INFO BlockManagerMaster: Registered BlockManager 
Exception in thread "main" java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.hdfs.DistributedFileSystem not found 
    at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2047) 
    at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2578) 
    at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591) 
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:367) 
    at org.apache.spark.util.FileLogger.<init>(FileLogger.scala:90) 
    at org.apache.spark.scheduler.EventLoggingListener.<init>(EventLoggingListener.scala:63) 
    at org.apache.spark.SparkContext.<init>(SparkContext.scala:352) 
    at org.apache.spark.SparkContext.<init>(SparkContext.scala:92) 
    at Spark_App_Main_Class_Name$.main(Spark_App_Main_Class_Name.scala:22) 
    at Spark_App_Main_Class_Name.main(Spark_App_Main_Class_Name.scala) 
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) 
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 
    at java.lang.reflect.Method.invoke(Method.java:606) 
    at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:358) 
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75) 
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) 
Caused by: java.lang.ClassNotFoundException: Class org.apache.hadoop.hdfs.DistributedFileSystem not found 
    at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1953) 
    at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2045) 
    ... 16 more 

應用看來,星火應用程序不能夠映射HDFS,因爲最初我得到的錯誤:

Exception in thread "main" java.io.IOException: No FileSystem for scheme: hdfs 
    at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2584) 
    at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591) 
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:367) 
    at org.apache.spark.util.FileLogger.<init>(FileLogger.scala:90) 
    at org.apache.spark.scheduler.EventLoggingListener.<init>(EventLoggingListener.scala:63) 
    at org.apache.spark.SparkContext.<init>(SparkContext.scala:352) 
    at org.apache.spark.SparkContext.<init>(SparkContext.scala:92) 
    at LogisticRegressionwithBFGS$.main(LogisticRegressionwithBFGS.scala:21) 
    at LogisticRegressionwithBFGS.main(LogisticRegressionwithBFGS.scala) 
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) 
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 
    at java.lang.reflect.Method.invoke(Method.java:606) 
    at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:358) 
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75) 
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) 

,我跟着hadoop No FileSystem for scheme: file添加「FS。 hdfs.impl「和」fs.file.impl「添加到Spark配置設置中

回答

0

我已經通過一些詳細的搜索和做了不同的試用方法後通過這個問題。基本上,這個問題似乎是由於Hadoop的HDFS罐子不可用,但在提交火花申請,因罐子找不到,即使使用maven-assembly-pluginmaven-jar-plugin/maven-dependency-plugin

maven-jar-plugin/maven-dependency-plugin組合,後主類罐子和正在創建的依賴罐子,但仍然提供與--jar選項因罐子導致了同樣的錯誤被「krookedking」如下

./spark-submit --class Spark_App_Main_Class_Name --master spark://localhost.localdomain:7077 --deploy-mode client --executor-memory 4G --jars ../apps/Spark_App_Target_Jar_Name-dep.jar ../apps/Spark_App_Target_Jar_Name.jar 

使用maven-shade-pluginhadoop-no-filesystem-for-scheme-file建議似乎擊中了問題右頁oint,因爲創建包含主類和所有相關類的單個jar文件消除了類路徑問題。

我的最終工作火花提交命令的內容如下:

./spark-submit --class Spark_App_Main_Class_Name --master spark://localhost.localdomain:7077 --deploy-mode client --executor-memory 4G ../apps/Spark_App_Target_Jar_Name.jar 

在我的項目pom.xml中maven-shade-plugin如下:

<plugin> 
     <groupId>org.apache.maven.plugins</groupId> 
     <artifactId>maven-shade-plugin</artifactId> 
     <version>2.4.2</version> 
     <executions> 
      <execution> 
       <phase>package</phase> 
       <goals> 
        <goal>shade</goal> 
       </goals> 
       <configuration> 
        <filters> 
         <filter> 
          <artifact>*:*</artifact> 
          <excludes> 
           <exclude>META-INF/*.SF</exclude> 
           <exclude>META-INF/*.DSA</exclude> 
           <exclude>META-INF/*.RSA</exclude> 
          </excludes> 
         </filter> 
        </filters> 
        <transformers> 
         <transformer implementation="org.apache.maven.plugins.shade.resource.ServicesResourceTransformer"/> 
        </transformers> 
       </configuration> 
      </execution> 
     </executions> 
     </plugin> 

注:中相應的過濾器可排除將能夠擺脫

java.lang.SecurityException: Invalid signature file digest for Manifest main attributes 
7

您需要在您的類路徑中包含hadoop-hdfs-2.x罐子號(maven link)。 在提交您的應用程序時,使用spark-submit的--jar選項提及附加的jar位置。

另一個說明,你應該理想地移動到具有spark1.5的CDH5.5。

+0

添加了--jars選項Hadoop的HDFS罐,而火花提交但是給拋出java.lang.ClassNotFoundException: somnathchakrabarti

+0

索姆納特,可以提供完整的火花提交命令 –

+0

./spark-submit --class Spark_App_Main_Class_Name --master spark://localhost.localdomain:7077 - 部署模式客戶端--executor - 內存4G --jars /opt/cloudera/parcels/CDH/lib/hadoop-hdfs/*.jar。 ./apps/Spark_App_Target_Jar_Name.jar解決了ClassNotFoundException,但沒有看到任何完成的應用程序在Spark Master WebUI – somnathchakrabarti

-1

從IDE運行Spark代碼並訪問遠程HDFS時,我遇到了同樣的問題。
因此,我設置了以下配置,並得到解決。

JavaSparkContext jsc=new JavaSparkContext(conf); 
Configuration hadoopConfig = jsc.hadoopConfiguration(); 
hadoopConfig.set("fs.hdfs.impl",org.apache.hadoop.hdfs.DistributedFileSystem.class.getName()); 
hadoopConfig.set("fs.file.impl",org.apache.hadoop.fs.LocalFileSystem.class.getName()); 
+0

請添加一些上下文到您的答案。解釋它是如何解決這個問題的。您可能會拒絕投票和/或關閉 –

+0

並至少修復縮進 –

相關問題