2014-11-04 52 views
2

我在遠程使用Hadoop Java API將文件放入HDFS 2.5.0 Single Node Hadoop Docker Container時遇到問題。在Hadoop系統上運行時,我可以將本地文件複製到hdfs中,而不會出現問題。但是,當我試圖將數據放入文件時,我遇到了問題。我得到以下異常:Hadoop 2.5.0未能遠程寫入文件

Exception in thread "main" org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /user/root/books/beowulf.txt could only be replicated to 0 nodes instead of minReplication (=1). There are 1 datanode(s) running and 1 node(s) are excluded in this operation. 
    at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1471) 
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2791) 
    at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:606) 
    at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:455) 
    at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) 
    at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) 
    at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) 
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) 
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) 
    at java.security.AccessController.doPrivileged(Native Method) 
    at javax.security.auth.Subject.doAs(Subject.java:415) 
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) 
    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) 

    at org.apache.hadoop.ipc.Client.call(Client.java:1411) 
    at org.apache.hadoop.ipc.Client.call(Client.java:1364) 
    at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) 
    at com.sun.proxy.$Proxy14.addBlock(Unknown Source) 
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) 
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 
    at java.lang.reflect.Method.invoke(Method.java:606) 
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) 
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) 
    at com.sun.proxy.$Proxy14.addBlock(Unknown Source) 
    at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:368) 
    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1449) 
    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1270) 
    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:526) 

我沒有看到數據節點日誌中的任何錯誤,但我看到的名稱節點記錄相應的錯誤信息:

2014-11-04 14:19:26,111 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 3 Total time for transactions(ms): 13 Number of transactions batched in Syncs: 0 Number of syncs: 2 SyncTimes(ms): 10 
2014-11-04 14:19:26,801 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Rescanning after 30000 milliseconds 
2014-11-04 14:19:26,802 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Scanned 0 directive(s) and 0 block(s) in 1 millisecond(s). 
2014-11-04 14:19:27,136 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* allocateBlock: /user/root/books/beowulf.txt. BP-342727372-10.0.0.17-1414068411758 blk_1073741852_1028{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[[DISK]DS-511723cb-ff72-4585-bb81-90a2e1f154a3:NORMAL|RBW]]} 
2014-11-04 14:19:50,859 WARN org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to place enough replicas, still in need of 1 to reach 1. For more information, please enable DEBUG log level on org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy 
2014-11-04 14:19:50,860 INFO org.apache.hadoop.ipc.Server: IPC Server handler 3 on 9000, call org.apache.hadoop.hdfs.protocol.ClientProtocol.addBlock from 192.168.56.1:3805 Call#4 Retry#0 
java.io.IOException: File /user/root/books/beowulf.txt could only be replicated to 0 nodes instead of minReplication (=1). There are 1 datanode(s) running and 1 node(s) are excluded in this operation. 
    at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1471) 
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2791) 
    at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:606) 
    at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:455) 
    at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) 
    at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) 
    at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) 
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) 
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) 
    at java.security.AccessController.doPrivileged(Native Method) 
    at javax.security.auth.Subject.doAs(Subject.java:415) 
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) 
    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) 

至於我可以告訴異常只在關閉FSDataOutputStream後纔會發生。

下面是我使用的產生這個問題的代碼:

import com.spectralogic.ds3.hadoop.HadoopConstants; 
import org.apache.commons.io.IOUtils; 
import org.apache.hadoop.conf.Configuration; 
import org.apache.hadoop.fs.FSDataOutputStream; 
import org.apache.hadoop.fs.FileSystem; 
import org.apache.hadoop.fs.Path; 
import org.apache.hadoop.security.UserGroupInformation; 

import java.io.IOException; 
import java.io.InputStream; 
import java.security.PrivilegedExceptionAction; 

public class HdfsPutFile { 
    public static void main(final String[] args) throws IOException, InterruptedException { 

     final Configuration conf = new Configuration(); 
     final UserGroupInformation usgi = UserGroupInformation.createRemoteUser("root"); 

     usgi.doAs(new PrivilegedExceptionAction<Object>() { 
      @Override 
      public Object run() throws Exception { 
       conf.set(HadoopConstants.FS_DEFAULT_NAME, "hdfs://192.168.56.102:9000"); 
       conf.set(HadoopConstants.HADOOP_JOB_UGI, "root"); 

       try (final FileSystem hdfs = FileSystem.get(conf)) { 

        System.out.printf("Total Used Hdfs Storage: %d\n", hdfs.getStatus().getUsed()); 

        final String resourceName = "books/beowulf.txt"; 

        final Path path = new Path("/user/root", resourceName); 

        try (final InputStream inputStream = HdfsPutFile.class.getClassLoader().getResourceAsStream(resourceName); 
         final FSDataOutputStream outputStream = hdfs.create(path, true)) { 

         IOUtils.copy(inputStream, outputStream); 
        } 
       } 
       return null; 
      } 
     }); 
    } 
} 
+0

類似的問題在這裏:http://stackoverflow.com/questions/10097246/no-data-nodes-are-started – Hans 2014-11-04 19:59:25

回答

3

原來,這是失敗的原因是因爲我的代碼無法達到數據管理部,由於它是一個泊塢窗內容器,它的IP地址是內部Docker容器的IP地址。如果我在容器內部並運行代碼,那麼我可以成功地放置文件。

1

Hadoop's ports

所以當Hadoop是在泊塢窗,你想遠程玩它,你需要使用-p發佈一些Hadoop的端口的主機。

而且爲了告訴Hadoop您想使用主機名而不是IP地址,您必須將以下塊添加到客戶端的hdfs-site.xml中。

設置dfs.client.use.datanode.hostname爲true。