2012-05-31 57 views
0

我使用的讀從文件的字節:java.io.EOFException的HDFS中0.22.0

FileSystem fs = config.getHDFS(); 
      try { 

       Path path = new Path(dirName + '/' + fileName); 

       byte[] bytes = new byte[(int)fs.getFileStatus(path) 
         .getLen()]; 
       in = fs.open(path); 

       in.read(bytes); 
       result = new DataInputStream(new ByteArrayInputStream(bytes)); 
      } catch (Exception e) { 
       e.printStackTrace(); 
       if (in != null) { 
        try { 
         in.close(); 
        } catch (IOException e1) { 
         e1.printStackTrace(); 
        } 
       } 
      } 

有在我從閱讀目錄大約15,000文件。某一點後,我上線in.read此異常(字節):

2012-05-31 14:11:45,477 [INFO:main] (DFSInputStream.java:414) - Failed to connect to /165.36.80.28:50010, add to deadNodes and continue 
java.io.EOFException 
     at java.io.DataInputStream.readShort(DataInputStream.java:298) 
     at org.apache.hadoop.hdfs.protocol.DataTransferProtocol$Status.read(DataTransferProtocol.java:115) 
     at org.apache.hadoop.hdfs.BlockReader.newBlockReader(BlockReader.java:427) 
     at org.apache.hadoop.hdfs.DFSInputStream.getBlockReader(DFSInputStream.java:725) 
     at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:390) 
     at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:514) 
     at java.io.DataInputStream.read(DataInputStream.java:83) 

拋出的另一個異常是:

2012-05-31 15:09:14,849 [INFO:main] (DFSInputStream.java:414) - Failed to connect to /165.36.80.28:50010, add to deadNodes and continue 
java.net.SocketException: No buffer space available (maximum connections reached?): connect 
    at sun.nio.ch.Net.connect(Native Method) 
    at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:507) 
    at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:192) 
    at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:373) 
    at org.apache.hadoop.hdfs.DFSInputStream.getBlockReader(DFSInputStream.java:719) 
    at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:390) 
    at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:514) 
    at java.io.DataInputStream.read(DataInputStream.java:83) 

請諮詢可能是什麼問題。

回答

3

您忽略來自in.read的返回值,並假設您可以一次讀取整個文件。不要這樣做。循環循環直到read返回-1,或者您已經讀取儘可能多的數據。我不清楚你是否真的應該像這樣相信getLen() - 如果文件在兩次調用之間增長(或縮小)會發生什麼?我建議創建一個ByteArrayOutputStream作爲臨時存儲並寫入一個小型(16K?)緩衝區,然後循環 - 讀入緩衝區,將很多字節寫入輸出流,起泡,漂洗,重複,直到read返回-1以指示流的結束。然後,您可以像從前一樣從ByteArrayOutputStream中獲取數據並將其放入ByteArrayInputStream

編輯:快速代碼,未經測試 - 有Guava類似(更好)的代碼,順便說一句。

public static byte[] readFully(InputStream stream) throws IOException { 
    ByteArrayOutputStream baos = new ByteArrayOutputStream(); 
    byte[] buffer = new byte[16 * 1024]; 
    int bytesRead; 
    while ((bytesRead = stream.read(buffer)) > 0) { 
     baos.write(buffer, 0, bytesRead); 
    } 
    return baos.toByteArray(); 
} 

然後只需使用:

in = fs.open(path); 
byte[] data = readFully(in); 
result = new DataInputStream(new ByteArrayInputStream(data)); 

還要注意的是,你應該關閉流中的finally塊,不只是例外。我也建議不要捕獲Exception本身。

+0

剛剛清楚,像這樣: baos = new ByteArrayOutputStream(); byte [] bytes = new byte [16]; in = fs.open(path); (in.read(bytes)> 0){ baos.write(bytes); } result = new DataInputStream(new ByteArrayInputStream(baos.toByteArray())); 我仍然得到這個變化後的錯誤 –

+1

@matroyd:你不明白哪一點 - 怎麼了,或者如何解決它?我已經添加了一些示例代碼,但是您的註釋中的代碼*仍然*假設在每次迭代中'in.read(bytes)'將有用的數據讀入'bytes'的* whole *。 –

+0

YOU ROCK !!感謝Jon的幫助 –

相關問題