我試圖使用distcp將文件夾從本地hadoop羣集(cdh4)複製到我的Amazon S3存儲桶。從本地Hadoop到Amazon S3的DistCp
我用下面的命令:
hadoop distcp -log /tmp/distcplog-s3/ hdfs://nameserv1/tmp/data/sampledata s3n://hdfsbackup/
hdfsbackup是我的亞馬遜S3存儲的名稱。
DistCp使用失敗,未知的主機異常:
13/05/31 11:22:33 INFO tools.DistCp: srcPaths=[hdfs://nameserv1/tmp/data/sampledata]
13/05/31 11:22:33 INFO tools.DistCp: destPath=s3n://hdfsbackup/
No encryption was performed by peer.
No encryption was performed by peer.
13/05/31 11:22:35 INFO hdfs.DFSClient: Created HDFS_DELEGATION_TOKEN token 54 for hadoopuser on ha-hdfs:nameserv1
13/05/31 11:22:35 INFO security.TokenCache: Got dt for hdfs://nameserv1; Kind: HDFS_DELEGATION_TOKEN, Service: ha-hdfs:nameserv1, Ident: (HDFS_DELEGATION_TOKEN token 54 for hadoopuser)
No encryption was performed by peer.
java.lang.IllegalArgumentException: java.net.UnknownHostException: hdfsbackup
at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:414)
at org.apache.hadoop.security.SecurityUtil.buildDTServiceName(SecurityUtil.java:295)
at org.apache.hadoop.fs.FileSystem.getCanonicalServiceName(FileSystem.java:282)
at org.apache.hadoop.fs.FileSystem.collectDelegationTokens(FileSystem.java:503)
at org.apache.hadoop.fs.FileSystem.addDelegationTokens(FileSystem.java:487)
at org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:130)
at org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:111)
at org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodes(TokenCache.java:85)
at org.apache.hadoop.tools.DistCp.setup(DistCp.java:1046)
at org.apache.hadoop.tools.DistCp.copy(DistCp.java:666)
at org.apache.hadoop.tools.DistCp.run(DistCp.java:881)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at org.apache.hadoop.tools.DistCp.main(DistCp.java:908)
Caused by: java.net.UnknownHostException: hdfsbackup
... 14 more
我有AWS ID /祕密中的所有節點的核心site.xml的配置。
<!-- Amazon S3 -->
<property>
<name>fs.s3.awsAccessKeyId</name>
<value>MY-ID</value>
</property>
<property>
<name>fs.s3.awsSecretAccessKey</name>
<value>MY-SECRET</value>
</property>
<!-- Amazon S3N -->
<property>
<name>fs.s3n.awsAccessKeyId</name>
<value>MY-ID</value>
</property>
<property>
<name>fs.s3n.awsSecretAccessKey</name>
<value>MY-SECRET</value>
</property>
我能夠使用cp命令從hdfs複製文件沒有任何問題。下面的命令成功複製HDFS文件夾S3
hadoop fs -cp hdfs://nameserv1/tmp/data/sampledata s3n://hdfsbackup/
我知道有亞馬遜S3優化DistCp使用(s3distcp)可用,但我不希望使用它,因爲它不支持更新/覆蓋選項。
謝謝,我懷疑這是一個基於另一個問題的安全相關問題(https://issues.apache.org/jira/browse/HADOOP-8408),它有一個類似的異常堆棧跟蹤。 – Mohamed
重命名桶以查看線路域名不適用於我。 修補程序解決了問題。 – Mohamed