2016-10-03 35 views
1

根據Sparkling water guys的博客this,您現在可以使用Spark ML管道組件在最新版本中構建DL模型。我嘗試添加在我build.sbt蘇打水:無法利用火花毫升管道的支持

"org.apache.spark" % "spark-mllib_2.10" % "2.0.0" % "provided", 
"ai.h2o" % "sparkling-water-core_2.10" % "1.6.5" % "provided" 

,但沒有運氣的最新版本,試圖導入org.apache.spark.ml.h2o.H2OPipeline不起作用。 spark.ml內的h2o包似乎並不存在於火花瓶中。儘管它似乎在上面的鏈接以及here。我真的想重用我的spark-mllib功能變換器創建一個使用h2o的DL模型,如博客所示。

任何幫助表示讚賞!

謝謝。

+0

不知道如果是這樣的問題,但你使用的Spark 2.0 Sparkling Water 1.6.5,你應該使用最近發佈的Sparkling Water 2.0。 –

+0

我懷疑是這樣。 https://mvnrepository.com/artifact/ai.h2o/sparkling-water-core_2.10直到1.6.8發佈。 – void

+0

另外我們正在討論'org.apache.spark.ml'中丟失的包嗎? – void

回答

2

1)請不要使用火花2與SW 1.6.5 - 它不會工作。我們發佈sw2.0斯卡拉2.11 https://mvnrepository.com/artifact/ai.h2o/sparkling-water-core_2.11

2)您只需要添加SW核心在你的身材,你正在尋找的類都在波光粼粼的水毫升https://mvnrepository.com/artifact/ai.h2o/sparkling-water-ml_2.11

+0

謝謝!如果您還可以指出一些與ml管道支持波光粼粼的水有關的文檔,那將是非常好的。 – void

+0

@void不幸的是,我們沒有太多的文檔,因爲我們仍在實現這一點,並認爲它是一個實驗性功能。現在我們只支持H2O的GBM和DeepLearning作爲管道的一部分(這裏是一些示例代碼https://github.com/h2oai/sparkling-water/blob/master/examples/pipelines/hamOrSpam.script.scala)。我們對捐款非常開放:-) –

+1

非常感謝您的回覆!請向我指出與向管道添加更多算法支持相關的任何問題(jira)?我想跟蹤。 – void

0

我已經使用以下版本運行H2O例如與Maven pom.xml的,它是工作

  • 星火 - 1.6
  • 蘇打水 - 1.6.8
  • AI H2O - 3.10.0.8

下面是行家的pom.xml(請參閱GIT回購 - https://github.com/seerampavan/H2oTesting/blob/master/pom.xml

<properties> 
    <spark.version>1.6.0-cdh5.7.1</spark.version> 
    <scala.version>2.10.4</scala.version> 
    <scala.binary.version>2.10</scala.binary.version> 
    <top.dir>${project.basedir}/..</top.dir> 
    <hadoop.version>2.6.0-cdh5.7.1</hadoop.version> 
</properties> 

<dependencies> 
    <!-- Force import of Spark's servlet API for unit tests --> 
    <dependency> 
     <groupId>javax.servlet</groupId> 
     <artifactId>javax.servlet-api</artifactId> 
     <version>3.0.1</version> 
    </dependency> 
    <dependency> 
     <groupId>org.scala-lang</groupId> 
     <artifactId>scala-library</artifactId> 
     <version>${scala.version}</version> 
     <!--<scope>provided</scope>--> 
    </dependency> 
    <dependency> 
     <groupId>org.apache.spark</groupId> 
     <artifactId>spark-core_${scala.binary.version}</artifactId> 
     <version>${spark.version}</version> 

     <exclusions> 
      <exclusion> 
       <!-- make sure wrong scala version is not pulled in --> 
       <groupId>org.scala-lang</groupId> 
       <artifactId>scala-library</artifactId> 
      </exclusion> 
      <exclusion> 
       <!-- make sure wrong scala version is not pulled in --> 
       <groupId>org.scala-lang</groupId> 
       <artifactId>scalap</artifactId> 
      </exclusion> 
     </exclusions> 
    </dependency> 
    <dependency> 
     <groupId>org.apache.spark</groupId> 
     <artifactId>spark-sql_${scala.binary.version}</artifactId> 
     <version>${spark.version}</version> 

    </dependency> 
    <dependency> 
     <groupId>org.apache.spark</groupId> 
     <artifactId>spark-hive_${scala.binary.version}</artifactId> 
     <version>${spark.version}</version> 

    </dependency> 
    <dependency> 
     <groupId>org.apache.spark</groupId> 
     <artifactId>spark-mllib_${scala.binary.version}</artifactId> 
     <version>${spark.version}</version> 

     <exclusions> 
      <exclusion> 
       <groupId>org.jpmml</groupId> 
       <artifactId>pmml-model</artifactId> 
      </exclusion> 
     </exclusions> 

    </dependency> 
    <dependency> 
     <groupId>org.apache.spark</groupId> 
     <artifactId>spark-streaming_${scala.binary.version}</artifactId> 
     <version>${spark.version}</version> 

    </dependency> 
    <dependency> 
     <groupId>org.apache.spark</groupId> 
     <artifactId>spark-streaming-kafka_${scala.binary.version}</artifactId> 
     <version>${spark.version}</version> 

    </dependency> 
    <dependency> 
     <groupId>org.apache.spark</groupId> 
     <artifactId>spark-streaming_${scala.binary.version}</artifactId> 
     <version>${spark.version}</version> 
     <type>test-jar</type> 
     <classifier>tests</classifier> 

    </dependency> 
    <dependency> 
     <groupId>org.scalatest</groupId> 
     <artifactId>scalatest_${scala.binary.version}</artifactId> 
     <version>2.2.1</version> 

    </dependency> 
    <dependency> 
     <groupId>junit</groupId> 
     <artifactId>junit</artifactId> 
     <version>4.12</version> 

    </dependency> 

    <dependency> 
     <groupId>org.apache.hadoop</groupId> 
     <artifactId>hadoop-client</artifactId> 
     <version>${hadoop.version}</version> 
     <exclusions> 
      <exclusion> 
       <groupId>log4j</groupId> 
       <artifactId>log4j</artifactId> 
      </exclusion> 
      <exclusion> 
       <groupId>javax.servlet</groupId> 
       <artifactId>servlet-api</artifactId> 
      </exclusion> 
      <exclusion> 
       <groupId>javax.servlet.jsp</groupId> 
       <artifactId>jsp-api</artifactId> 
      </exclusion> 
      <exclusion> 
       <groupId>org.jruby</groupId> 
       <artifactId>jruby-complete</artifactId> 
      </exclusion> 
      <exclusion> 
       <groupId>org.jboss.netty</groupId> 
       <artifactId>netty</artifactId> 
      </exclusion> 
      <exclusion> 
       <groupId>io.netty</groupId> 
       <artifactId>netty</artifactId> 
      </exclusion> 
     </exclusions> 
    </dependency> 
    <dependency> 
     <groupId>org.scala-lang</groupId> 
     <artifactId>scala-reflect</artifactId> 
     <version>2.10.5</version> 
    </dependency> 
    <dependency> 
     <groupId>ai.h2o</groupId> 
     <artifactId>h2o-web</artifactId> 
     <version>3.10.0.8</version> 
    </dependency> 
    <dependency> 
     <groupId>ai.h2o</groupId> 
     <artifactId>h2o-scala_2.10</artifactId> 
     <version>3.10.0.8</version> 
    </dependency> 
    <dependency> 
     <groupId>ai.h2o</groupId> 
     <artifactId>h2o-persist-s3</artifactId> 
     <version>3.10.0.8</version> 
    </dependency> 
    <dependency> 
     <groupId>ai.h2o</groupId> 
     <artifactId>h2o-persist-hdfs</artifactId> 
     <version>3.10.0.8</version> 
    </dependency> 
    <dependency> 
     <groupId>ai.h2o</groupId> 
     <artifactId>h2o-parquet-parser</artifactId> 
     <version>3.10.0.8</version> 
    </dependency> 
    <dependency> 
     <groupId>ai.h2o</groupId> 
     <artifactId>h2o-genmodel</artifactId> 
     <version>3.10.0.8</version> 
    </dependency> 
    <dependency> 
     <groupId>ai.h2o</groupId> 
     <artifactId>h2o-core</artifactId> 
     <version>3.10.0.8</version> 
    </dependency> 
    <dependency> 
     <groupId>ai.h2o</groupId> 
     <artifactId>h2o-bindings</artifactId> 
     <version>3.10.0.8</version> 
    </dependency> 
    <dependency> 
     <groupId>ai.h2o</groupId> 
     <artifactId>h2o-avro-parser</artifactId> 
     <version>3.10.0.8</version> 
    </dependency> 
    <dependency> 
     <groupId>ai.h2o</groupId> 
     <artifactId>h2o-app</artifactId> 
     <version>3.10.0.8</version> 
    </dependency> 
    <dependency> 
     <groupId>ai.h2o</groupId> 
     <artifactId>h2o-algos</artifactId> 
     <version>3.10.0.8</version> 
    </dependency> 
    <dependency> 
     <groupId>ai.h2o</groupId> 
     <artifactId>sparkling-water-repl_2.10</artifactId> 
     <version>1.6.8</version> 
    </dependency> 
    <dependency> 
     <groupId>ai.h2o</groupId> 
     <artifactId>sparkling-water-ml_2.10</artifactId> 
     <version>1.6.8</version> 
    </dependency> 
    <dependency> 
     <groupId>ai.h2o</groupId> 
     <artifactId>sparkling-water-examples_2.10</artifactId> 
     <version>1.6.8</version> 
    </dependency> 
    <dependency> 
     <groupId>ai.h2o</groupId> 
     <artifactId>sparkling-water-core_2.10</artifactId> 
     <version>1.6.8</version> 
    </dependency> 
    <dependency> 
     <groupId>ai.h2o</groupId> 
     <artifactId>deepwater-backend-api</artifactId> 
     <version>1.0.0</version> 
    </dependency> 

    <dependency> 
     <groupId>joda-time</groupId> 
     <artifactId>joda-time</artifactId> 
     <version>2.9.2</version> 
    </dependency> 
    <dependency> 
     <groupId>org.joda</groupId> 
     <artifactId>joda-convert</artifactId> 
     <version>1.8.1</version> 
    </dependency> 
    <dependency> 
     <groupId>org.javassist</groupId> 
     <artifactId>javassist</artifactId> 
     <version>3.22.0-CR1</version> 
    </dependency> 
    <dependency> 
     <groupId>gov.nist.math</groupId> 
     <artifactId>jama</artifactId> 
     <version>1.0.3</version> 
    </dependency> 
    <dependency> 
     <groupId>com.google.code.gson</groupId> 
     <artifactId>gson</artifactId> 
     <version>2.7</version> 
    </dependency> 
    <dependency> 
     <groupId>ai.h2o</groupId> 
     <artifactId>reflections</artifactId> 
     <version>0.9.11-h2o-custom</version> 
    </dependency> 
    <dependency> 
     <groupId>ai.h2o</groupId> 
     <artifactId>google-analytics-java</artifactId> 
     <version>1.1.2-H2O-CUSTOM</version> 
    </dependency> 
    <dependency> 
     <groupId>com.github.tony19</groupId> 
     <artifactId>named-regexp</artifactId> 
     <version>0.2.4</version> 
    </dependency> 
    <dependency> 
     <groupId>com.amazonaws</groupId> 
     <artifactId>aws-java-sdk-s3</artifactId> 
     <version>1.11.45</version> 
    </dependency> 
    <dependency> 
     <groupId>com.amazonaws</groupId> 
     <artifactId>aws-java-sdk-kms</artifactId> 
     <version>1.11.45</version> 
    </dependency> 
    <dependency> 
     <groupId>com.amazonaws</groupId> 
     <artifactId>aws-java-sdk-core</artifactId> 
     <version>1.11.45</version> 
    </dependency> 
    <dependency> 
     <groupId>org.eclipse.jetty.aggregate</groupId> 
     <artifactId>jetty-servlet</artifactId> 
     <version>8.2.0.v20160908</version> 
    </dependency> 
    <dependency> 
     <groupId>org.eclipse.jetty.aggregate</groupId> 
     <artifactId>jetty-server</artifactId> 
     <version>8.2.0.v20160908</version> 
    </dependency> 
    <dependency> 
     <groupId>org.eclipse.jetty.aggregate</groupId> 
     <artifactId>jetty-plus</artifactId> 
     <version>8.1.17.v20150415</version> 
    </dependency> 
</dependencies>