2014-05-23 16 views
4

我遇到了一些我的Hadoop應用程序出現的問題。崩潰的HDFS客戶端 - 如何關閉剩餘的打開文件?

每當我的客戶退出而不關閉文件(例如,由於系統崩潰),還有在Hadoop中打開的文件一律不收。

當我再嘗試重新啓動重新打開這些文件時,將數據追加失敗的客戶端。 (參見下面的異常消息)

有手動或更好的關閉這些文件的方式來檢查和直接重啓之前關閉它們的好方法?

我使用Cloudera的CDH5(2.3.0-cdh5.0.0)。


這是我打開的文件,客戶端已意外退出後:

$ hadoop fsck -openforwrite/

[[email protected] ~]# su hdfs -c'hadoop fsck -openforwrite /' 
Connecting to namenode via http://cloudera:50070 
FSCK started by hdfs (auth:SIMPLE) from /127.0.0.1 for path/at Fri May 23 08:04:20 PDT 2014 
../tmp/event_consumer_test/game=game1/month=201405/day=20140521/events_2014052100 11806743 bytes, 1 block(s), OPENFORWRITE: ../tmp/event_consumer_test/game=game1/month=201405/day=20140521/events_2014052103 11648439 bytes, 1 block(s), OPENFORWRITE: ..../tmp/event_consumer_test/game=game1/month=201405/day=20140521/events_2014052108 11953116 bytes, 1 block(s), OPENFORWRITE: /tmp/event_consumer_test/game=game1/month=201405/day=20140521/events_2014052109 12047982 bytes, 1 block(s), OPENFORWRITE: .../tmp/event_consumer_test/game=game1/month=201405/day=20140521/events_2014052113 12010734 bytes, 1 block(s), OPENFORWRITE: ........../tmp/event_consumer_test/game=game1/month=201405/day=20140521/events_2014052124 11674047 bytes, 1 block(s), OPENFORWRITE: /tmp/event_consumer_test/game=game2/month=201405/day=20140521/events_2014052100 11995602 bytes, 1 block(s), OPENFORWRITE: /tmp/event_consumer_test/game=game2/month=201405/day=20140521/events_2014052101 12257502 bytes, 1 block(s), OPENFORWRITE: ../tmp/event_consumer_test/game=game2/month=201405/day=20140521/events_2014052104 11964174 bytes, 1 block(s), OPENFORWRITE: .../tmp/event_consumer_test/game=game2/month=201405/day=20140521/events_2014052108 11777061 bytes, 1 block(s), OPENFORWRITE: /tmp/event_consumer_test/game=game2/month=201405/day=20140521/events_2014052109 12000840 bytes, 1 block(s), OPENFORWRITE: ......./tmp/event_consumer_test/game=game2/month=201405/day=20140521/events_2014052117 12041871 bytes, 1 block(s), OPENFORWRITE: .../tmp/event_consumer_test/game=game2/month=201405/day=20140521/events_2014052121 12129462 bytes, 1 block(s), OPENFORWRITE: ../tmp/event_consumer_test/game=game2/month=201405/day=20140521/events_2014052124 11856213 bytes, 1 block(s), OPENFORWRITE: ....../tmp/event_consumer_test/game=game3/month=201405/day=20140521/events_2014052106 11863488 bytes, 1 block(s), OPENFORWRITE: ....../tmp/event_consumer_test/game=game3/month=201405/day=20140521/events_2014052113 11707803 bytes, 1 block(s), OPENFORWRITE: ./tmp/event_consumer_test/game=game3/month=201405/day=20140521/events_2014052115 11690052 bytes, 1 block(s), OPENFORWRITE: ../tmp/event_consumer_test/game=game3/month=201405/day=20140521/events_2014052118 11898117 bytes, 1 block(s), OPENFORWRITE: ........../tmp/logs/hdfs/logs/application_1400845529689_0013/cloudera_8041 0 bytes, 0 block(s), OPENFORWRITE: .................. 
......................................../user/history/done_intermediate/hdfs/job_1400845529689_0007.summary_tmp 0 bytes, 0 block(s), OPENFORWRITE: ........................................................... 
.................................................................................................... 
................................................Status: HEALTHY 
Total size: 1080902001 B 
Total dirs: 68 
Total files: 348 
Total symlinks:  0 
Total blocks (validated): 344 (avg. block size 3142156 B) 
Minimally replicated blocks: 344 (100.0 %) 
Over-replicated blocks: 0 (0.0 %) 
Under-replicated blocks: 0 (0.0 %) 
Mis-replicated blocks:  0 (0.0 %) 
Default replication factor: 1 
Average block replication: 1.0 
Corrupt blocks:  0 
Missing replicas:  0 (0.0 %) 
Number of data-nodes:  1 
Number of racks:  1 
FSCK ended at Fri May 23 08:04:20 PDT 2014 in 25 milliseconds 


The filesystem under path '/' is HEALTHY 

代碼(減少到問題)來創建和寫入文件:

Path path = new Path(filename); 

if(!this.fs.exists(path)) { 
    this.fs.create(path).close(); 
} 

OutputStream out = this.fs.append(path); 
out.write(... message ...); 

IOUtils.closeStream(out); 

異常試圖寫一個打開的文件時,我得到:

Exception in thread "main" org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException): failed to create file /tmp/event_consumer_test/game=game1/month=201405/day=20140521/events_2014052124 for DFSClient_NONMAPREDUCE_-1420767882_1 on client 127.0.0.1 because current leaseholder is trying to recreate file. 
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:2458) 
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFileInternal(FSNamesystem.java:2340) 
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFileInt(FSNamesystem.java:2569) 
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFile(FSNamesystem.java:2532) 
    at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.append(NameNodeRpcServer.java:522) 
    at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.append(ClientNamenodeProtocolServerSideTranslatorPB.java:373) 
    at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) 
    at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) 
    at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026) 
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1986) 
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1982) 
    at java.security.AccessController.doPrivileged(Native Method) 
    at javax.security.auth.Subject.doAs(Subject.java:415) 
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) 
    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1980) 
    at org.apache.hadoop.ipc.Client.call(Client.java:1409) 
    at org.apache.hadoop.ipc.Client.call(Client.java:1362) 
    at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) 
    at com.sun.proxy.$Proxy9.append(Unknown Source) 
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) 
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 
    at java.lang.reflect.Method.invoke(Method.java:606) 
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186) 
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) 
    at com.sun.proxy.$Proxy9.append(Unknown Source) 
    at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.append(ClientNamenodeProtocolTranslatorPB.java:276) 
    at org.apache.hadoop.hdfs.DFSClient.callAppend(DFSClient.java:1558) 
    at org.apache.hadoop.hdfs.DFSClient.append(DFSClient.java:1598) 
    at org.apache.hadoop.hdfs.DFSClient.append(DFSClient.java:1586) 
    at org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:320) 
    at org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:316) 
    at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) 
    at org.apache.hadoop.hdfs.DistributedFileSystem.append(DistributedFileSystem.java:316) 
    at org.apache.hadoop.fs.FileSystem.append(FileSystem.java:1161) 
    at com.cmp.eventconsumer.io.HdfsOutputManager.get(HdfsOutputManager.java:46) 
    at com.cmp.eventconsumer.EventConsumer.fetchEvents(EventConsumer.java:68) 
    at com.cmp.eventconsumer.EventConsumer.main(EventConsumer.java:112) 
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) 
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 
    at java.lang.reflect.Method.invoke(Method.java:606) 
    at org.apache.hadoop.util.RunJar.main(RunJar.java:212) 
+1

你解決了這個問題嗎?我遇到了與最新Cloudera版本相同的問題(http://www.cloudera.com/content/cloudera/en/documentation/cdh5/v5-0-0/CDH5-Installation-Guide/cdh5ig_hdfs_mountable.html) –

+0

@ValerioSchiavoni:不,不幸的是。我們更改了實現,以將當前時間戳附加到文件名並使用cronjob將它們合併在一起。在另一個實現中,我們有一個從1-10開始的for循環,並將'i'附加到文件名。如果沒有發生異常,我們跳過循環的其餘部分。不是那麼優雅,但解決方案也適用。 – twes

回答

1

我有同樣的proplem。 我的工作是:

try { 

} catch(Excetion e) { 
    logger.info("try to recover file Lease : "+hdfspath); 
    fileSystem.recoverLease(hdfspath); 
    boolean isclosed= filesystem.isFileClosed(hdfspath); 
    Stopwatch sw = new StopWatch().start(); 
    while(!isclosed) { 
    if(sw.elapsedMillis()>60*1000) 
     throw e; 
    try { 
     Thread.currentThread().sleep(1000); 
    } catch (InterruptedException e1) { 
    } 
    isclosed = filesystem.isFileClosed(hdfspath); 
    } 
} 
+0

'

try { } catch(Excetion e) { logger.info("try to recover file Lease : "+hdfspath); fileSystem.recoverLease(hdfspath); boolean isclosed= filesystem.isFileClosed(hdfspath); Stopwatch sw = new StopWatch().start(); while(!isclosed) { if(sw.elapsedMillis()>60*1000) thwo e; try { \t \t Thread.currentThread().sleep(1000); \t } catch (InterruptedException e1) { \t } \t isclosed = filesystem.isFileClosed(hdfspath); } } 
' – wankunde

0

你應該關閉該文件在finally塊。

try { 
} catch(SomeException ex) { 
} finally { 
    //close the file. 
} 

你也知道墜機的原因嗎?

如果您使用的是Java 7:Auto Close Feature

+0

「崩潰」的一種情況是當我們不得不殺掉它,因爲我們想更新軟件。我添加了finally代碼塊,這有助於在發生異常時觸發通過我們的消息隊列系統。但是如果我點擊CTRL + C,Hadoop庫中的一些關閉鉤子仍然會導致剩餘的打開文件。有什麼方法可以手動解決這個問題嗎? – twes

+0

是的,有一種方法,你需要註冊一個進程的回調函數(os信號) - Runtime.getRuntime()。addShutdownHook – Andrew

相關問題