我現在正在建立一個hadoop集羣(4臺機器,包括1個master來運行namenode和jobTracker,3個slave用來運行dataNode和TaskTraker)。但是所有的datanode都無法連接,這是一個可能。我在主機上運行sudo netstat -ntlp
,結果是:hadoop集羣datanodes無法連接:所有節點具有相同的主機名是否正常?
tcp 0 0 0.0.0.0:52193 0.0.0.0:* LISTEN -
tcp 0 0 0.0.0.0:2049 0.0.0.0:* LISTEN -
tcp 0 0 0.0.0.0:39267 0.0.0.0:* LISTEN 7284/rpc.mountd
tcp 0 0 0.0.0.0:33219 0.0.0.0:* LISTEN 7284/rpc.mountd
tcp 0 0 0.0.0.0:41000 0.0.0.0:* LISTEN 1539/mongos
tcp6 0 0 :::50030 :::* LISTEN 604/java
tcp6 0 0 :::57134 :::* LISTEN 32646/java
tcp6 0 0 :::111 :::* LISTEN 13786/rpcbind
tcp6 0 0 :::57428 :::* LISTEN -
tcp6 0 0 :::57173 :::* LISTEN 7284/rpc.mountd
tcp6 0 0 :::50070 :::* LISTEN 32646/java
tcp6 0 0 :::5910 :::* LISTEN 2452/Xvnc
tcp6 0 0 :::22 :::* LISTEN 32473/sshd
tcp6 0 0 :::50744 :::* LISTEN 7284/rpc.mountd
tcp6 0 0 :::55036 :::* LISTEN 14031/rpc.statd
tcp6 0 0 :::42205 :::* LISTEN 7284/rpc.mountd
tcp6 0 0 :::44289 :::* LISTEN 504/java
tcp6 0 0 :::2049 :::* LISTEN -
tcp6 0 0 :::38950 :::* LISTEN 604/java
tcp6 0 0 192.168.10.10:9000 :::* LISTEN 32646/java
tcp6 0 0 192.168.10.10:9001 :::* LISTEN 604/java
tcp6 0 0 :::50090 :::* LISTEN 504/java
而且從我的3臺數據節點機的異常信息是相同的(當然他們有不同的IPS): 的數據節點192.168.10.12的錯誤日誌:
2014-01-13 12:41:02,332 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source ugi registered.
2014-01-13 12:41:02,334 WARN org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Source name ugi already exists!
2014-01-13 12:41:03,427 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: mongodb/192.168.10.12:9000. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
2014-01-13 12:41:04,427 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: mongodb/192.168.10.12:9000. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
2014-01-13 12:41:05,428 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: mongodb/192.168.10.12:9000. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
2014-01-13 12:41:06,428 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: mongodb/192.168.10.12:9000. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
所以,讓我困惑的是,爲什麼每個datanode都是tryi例如,datanode 192.168.10.12的錯誤日誌顯示它正試圖在端口9000上連接192.168.10.12,並且在12上沒有這樣的監聽端口。
我所有的羣集節點具有相同的主機名但不同的ips(主機:192.168.10.10從機:192.168.10.11,192.168.10.12,192.168.10.13),我的所有配置文件,包括core-site.xml
,hdfs-site.xml
和mapred-site.xml
直接使用ip地址。我用sudo ufw status
,它顯示所有機器的防火牆都是**inactive**
!
該4臺機器的配置文件是相同的(當然IP是不同):
芯-site.xml中
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://192.168.10.12:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/var/hadoop/tmp</value>
</property>
</configuration>
HDFS-site.xml中
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>dfs.name.dir</name>
<value>/var/hadoop/name</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/var/hadoop/data</value>
</property>
</configuration>
mapred-site.xml中
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>http://192.168.10.12:9001</value>
</property>
</configuration>
的/ etc /在我的主人::
127.0.0.1 localhost
192.168.10.12 mongodb
192.168.10.12 localhost
# The following lines are desirable for IPv6 capable hosts
::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
在第2行的主機爲文件/ etc /主機,MongoDB是集羣的主機名,和其他三個從站的主機名也mongodb的(因爲這些機器曾經用作MongoDB羣集)。
我強烈懷疑這是導致問題的tcp6。那麼,如何啓動使用tcp網絡而不是tcp6的hadoop?
所以,我的問題是:
1.Is的Hadoop在TCP6而不是TCP OK開始了嗎?
2.所有的羣集機器都有相同的主機名,但不同的局域網ip可以用於hadoop?
那麼,有什麼建議嗎?
你檢查過你的防火牆設置嗎? – zhutoulala
我使用sudo ufw狀態,它顯示:對於所有機器不活動。 – wuchang
重新啓動所有四個服務(namenode,datanode,jobtracker和tasktrakcer),然後檢查。 你在namenode機器的主機文件中設置了什麼IP? – Sudz