1
下面是CM上報告的健康問題的快照。列表中的datanode不斷變化。從數據管理部日誌中的一些錯誤:Cloudera Manager健康問題:NameNode連接,Web服務器狀態
3:59:31.859 PM ERROR org.apache.hadoop.hdfs.server.datanode.DataNode
datanode05.hadoop.com:50010:DataXceiver error processing WRITE_BLOCK operation src: /10.248.200.113:45252 dest: /10.248.200.105:50010
java.io.IOException: Premature EOF from inputStream
at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:194)
at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:213)
at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:134)
at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:109)
at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:414)
at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:635)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:564)
at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:103)
at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:67)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:221)
at java.lang.Thread.run(Thread.java:662)
5:46:03.606 PM INFO org.apache.hadoop.hdfs.server.datanode.DataNode
Exception for BP-846315089-10.248.200.4-1369774276029:blk_-780307518048042460_200374997
java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/10.248.200.105:50010 remote=/10.248.200.122:43572]
at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:165)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:156)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:129)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:258)
at java.io.BufferedInputStream.read(BufferedInputStream.java:317)
at java.io.DataInputStream.read(DataInputStream.java:132)
at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:192)
at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:213)
at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:134)
at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:109)
at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:414)
at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:635)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:564)
at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:103)
at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:67)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:221)
at java.lang.Thread.run(Thread.java:662)
快照:
我無法找出問題的根源。我可以手動從一個datanode連接到另一個沒有問題,我不相信這是一個網絡問題。此外,缺少的塊和不足複製的塊計數也會發生變化(向下調高&)。
Cloudera的經理:Cloudera的標準4.8.1
CDH 4.7
解決這個問題,表示讚賞任何幫助。
更新:2016年1月1日
對於列爲壞的數據節點,當我看到dadanode日誌,我看到這條消息了很多...
11:58:30.066 AM INFO org.apache.hadoop.hdfs.server.datanode.DataNode
Receiving BP-846315089-10.248.200.4-1369774276029:blk_-706861374092956879_36606459 src: /10.248.200.123:56795 dest: /10.248.200.112:50010
爲什麼這個數據節點接收很多來自其他數據節點的塊在同一時間?看起來由於這個活動,datanode無法及時響應namenode請求,因此超時。所有不好的datanodes都顯示相同的模式。
我試過telnet,它連接到其他節點成功。 它似乎不是防火牆問題。現在列爲連接問題的節點...在幾分鐘後不會顯示在列表中。它的開關。 – scott