2
我試圖在EMR集羣上運行hadoop作業。它正在作爲我使用jar-with-dependencies
的Java命令運行。這項工作從Teradata中提取數據,我認爲Teradata相關的jar也包含在jar-with-dependencies中。不過,我仍然得到異常:指定AWS EMR自定義jar應用程序中的其他jar
Exception in thread "main" java.lang.RuntimeException: java.lang.RuntimeException: java.lang.ClassNotFoundException: com.teradata.jdbc.TeraDriver
at org.apache.hadoop.mapreduce.lib.db.DBInputFormat.setConf(DBInputFormat.java:171)
我pom
具有以下相關依存關係:
<dependency>
<groupId>teradata</groupId>
<artifactId>terajdbc4</artifactId>
<version>14.10.00.17</version>
</dependency>
<dependency>
<groupId>teradata</groupId>
<artifactId>tdgssconfig</artifactId>
<version>14.10.00.17</version>
</dependency>
我包裝完整的水瓶中下:
<build>
<plugins>
<plugin>
<artifactId>maven-compiler-plugin</artifactId>
<version>3.1</version>
<configuration>
<source>1.8</source>
<target>1.8</target>
<compilerArgument>-Xlint:-deprecation</compilerArgument>
</configuration>
</plugin>
<plugin>
<artifactId>maven-assembly-plugin</artifactId>
<version>2.2.1</version>
<configuration>
<descriptors>
</descriptors>
<archive>
<manifest>
</manifest>
</archive>
<descriptorRefs>
<descriptorRef>jar-with-dependencies</descriptorRef>
</descriptorRefs>
</configuration>
<executions>
<execution>
<id>make-assembly</id>
<phase>package</phase>
<goals>
<goal>single</goal>
</goals>
</execution>
</executions>
</plugin>
</plugins>
</build>
assembly.xml
文件:
<assembly>
<id>aws-emr</id>
<formats>
<format>jar</format>
</formats>
<includeBaseDirectory>false</includeBaseDirectory>
<dependencySets>
<dependencySet>
<unpack>false</unpack>
<includes>
</includes>
<scope>runtime</scope>
<outputDirectory>lib</outputDirectory>
</dependencySet>
<dependencySet>
<unpack>true</unpack>
<includes>
<include>${groupId}:${artifactId}</include>
</includes>
</dependencySet>
</dependencySets>
</assembly>
運行EMR命令:
aws emr create-cluster --release-label emr-5.3.1 \
--instance-groups \
InstanceGroupType=MASTER,InstanceCount=1,InstanceType=m3.xlarge \
InstanceGroupType=CORE,InstanceCount=5,BidPrice=0.1,InstanceType=m3.xlarge \
--service-role EMR_DefaultRole --log-uri s3://my-bucket/logs \
--applications Name=Hadoop --name TeradataPullerTest \
--ec2-attributes <ec2-attributes> \
--steps Type=CUSTOM_JAR,Name=EventsPuller,Jar=s3://path-to-jar-with-dependencies.jar,\
Args=[com.my.package.EventsPullerMR],ActionOnFailure=TERMINATE_CLUSTER \
--auto-terminate
有沒有我可以指定Teradata的罐子在執行的map-reduce任務,使得它們添加到類路徑的方法嗎?
編輯:我確認缺少的類是打包在jar-with-dependencies中的。
aws-emr$ jar tf target/aws-emr-0.0.1-SNAPSHOT-jar-with-dependencies.jar | grep TeraDriver
com/ncr/teradata/TeraDriver.class
com/teradata/jdbc/TeraDriver.class