java.io.EOFException的HDFS中0.22.0

我使用的讀從文件的字節：java.io.EOFException的HDFS中0.22.0

FileSystem fs = config.getHDFS(); 
      try { 

       Path path = new Path(dirName + '/' + fileName); 

       byte[] bytes = new byte[(int)fs.getFileStatus(path) 
         .getLen()]; 
       in = fs.open(path); 

       in.read(bytes); 
       result = new DataInputStream(new ByteArrayInputStream(bytes)); 
      } catch (Exception e) { 
       e.printStackTrace(); 
       if (in != null) { 
        try { 
         in.close(); 
        } catch (IOException e1) { 
         e1.printStackTrace(); 
        } 
       } 
      }

有在我從閱讀目錄大約15,000文件。某一點後，我上線in.read此異常（字節）：

2012-05-31 14:11:45,477 [INFO:main] (DFSInputStream.java:414) - Failed to connect to /165.36.80.28:50010, add to deadNodes and continue 
java.io.EOFException 
     at java.io.DataInputStream.readShort(DataInputStream.java:298) 
     at org.apache.hadoop.hdfs.protocol.DataTransferProtocol$Status.read(DataTransferProtocol.java:115) 
     at org.apache.hadoop.hdfs.BlockReader.newBlockReader(BlockReader.java:427) 
     at org.apache.hadoop.hdfs.DFSInputStream.getBlockReader(DFSInputStream.java:725) 
     at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:390) 
     at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:514) 
     at java.io.DataInputStream.read(DataInputStream.java:83)

拋出的另一個異常是：

2012-05-31 15:09:14,849 [INFO:main] (DFSInputStream.java:414) - Failed to connect to /165.36.80.28:50010, add to deadNodes and continue 
java.net.SocketException: No buffer space available (maximum connections reached?): connect 
    at sun.nio.ch.Net.connect(Native Method) 
    at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:507) 
    at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:192) 
    at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:373) 
    at org.apache.hadoop.hdfs.DFSInputStream.getBlockReader(DFSInputStream.java:719) 
    at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:390) 
    at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:514) 
    at java.io.DataInputStream.read(DataInputStream.java:83)

請諮詢可能是什麼問題。

來源

2012-05-31 Debajyoti Roy

您忽略來自in.read的返回值，並假設您可以一次讀取整個文件。不要這樣做。循環循環直到read返回-1，或者您已經讀取儘可能多的數據。我不清楚你是否真的應該像這樣相信getLen() - 如果文件在兩次調用之間增長（或縮小）會發生什麼？我建議創建一個ByteArrayOutputStream作爲臨時存儲並寫入一個小型（16K？）緩衝區，然後循環 - 讀入緩衝區，將很多字節寫入輸出流，起泡，漂洗，重複，直到read返回-1以指示流的結束。然後，您可以像從前一樣從ByteArrayOutputStream中獲取數據並將其放入ByteArrayInputStream。

編輯：快速代碼，未經測試 - 有Guava類似（更好）的代碼，順便說一句。

public static byte[] readFully(InputStream stream) throws IOException { 
    ByteArrayOutputStream baos = new ByteArrayOutputStream(); 
    byte[] buffer = new byte[16 * 1024]; 
    int bytesRead; 
    while ((bytesRead = stream.read(buffer)) > 0) { 
     baos.write(buffer, 0, bytesRead); 
    } 
    return baos.toByteArray(); 
}

然後只需使用：

in = fs.open(path); 
byte[] data = readFully(in); 
result = new DataInputStream(new ByteArrayInputStream(data));

還要注意的是，你應該關閉流中的finally塊，不只是例外。我也建議不要捕獲Exception本身。

來源

2012-05-31 19:11:43

剛剛清楚，像這樣： baos = new ByteArrayOutputStream（）; byte [] bytes = new byte [16]; in = fs.open（path）; （in.read（bytes）> 0）{ baos.write（bytes）; } result = new DataInputStream（new ByteArrayInputStream（baos.toByteArray（）））; 我仍然得到這個變化後的錯誤 –

@matroyd：你不明白哪一點 - 怎麼了，或者如何解決它？我已經添加了一些示例代碼，但是您的註釋中的代碼*仍然*假設在每次迭代中'in.read（bytes）'將有用的數據讀入'bytes'的* whole *。 –

YOU ROCK !!感謝Jon的幫助 –

java.io.EOFException的HDFS中0.22.0

回答

相關問題