2015-12-14 19 views
1

所以我刪除從從虛擬機列表中VM4,當我運行下面的命令不會訪問它從虛擬機刪除,仍然被紗線訪問/ TEZ

hdfs dfsadmin -report 

結果是:

[email protected]:~$ hdfs dfsadmin -report 
15/12/14 06:56:12 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 
Configured Capacity: 1268169326592 (1.15 TB) 
Present Capacity: 1199270457337 (1.09 TB) 
DFS Remaining: 1199213064192 (1.09 TB) 
DFS Used: 57393145 (54.73 MB) 
DFS Used%: 0.00% 
Under replicated blocks: 27 
Blocks with corrupt replicas: 0 
Missing blocks: 0 

------------------------------------------------- 
Datanodes available: 3 (3 total, 0 dead) 

Live datanodes: 
Name: 10.0.1.191:50010 (anmol-vm2-new) 
Hostname: anmol-vm2-new 
Decommission Status : Normal 
Configured Capacity: 422723108864 (393.69 GB) 
DFS Used: 19005440 (18.13 MB) 
Non DFS Used: 21501829120 (20.03 GB) 
DFS Remaining: 401202274304 (373.65 GB) 
DFS Used%: 0.00% 
DFS Remaining%: 94.91% 
Configured Cache Capacity: 0 (0 B) 
Cache Used: 0 (0 B) 
Cache Remaining: 0 (0 B) 
Cache Used%: 100.00% 
Cache Remaining%: 0.00% 
Last contact: Mon Dec 14 06:56:12 UTC 2015 


Name: 10.0.1.190:50010 (anmol-vm1-new) 
Hostname: anmol-vm1-new 
Decommission Status : Normal 
Configured Capacity: 422723108864 (393.69 GB) 
DFS Used: 19369984 (18.47 MB) 
Non DFS Used: 25831350272 (24.06 GB) 
DFS Remaining: 396872388608 (369.62 GB) 
DFS Used%: 0.00% 
DFS Remaining%: 93.88% 
Configured Cache Capacity: 0 (0 B) 
Cache Used: 0 (0 B) 
Cache Remaining: 0 (0 B) 
Cache Used%: 100.00% 
Cache Remaining%: 0.00% 
Last contact: Mon Dec 14 06:56:13 UTC 2015 


Name: 10.0.1.192:50010 (anmol-vm3-new) 
Hostname: anmol-vm3-new 
Decommission Status : Normal 
Configured Capacity: 422723108864 (393.69 GB) 
DFS Used: 19017721 (18.14 MB) 
Non DFS Used: 21565689863 (20.08 GB) 
DFS Remaining: 401138401280 (373.59 GB) 
DFS Used%: 0.00% 
DFS Remaining%: 94.89% 
Configured Cache Capacity: 0 (0 B) 
Cache Used: 0 (0 B) 
Cache Remaining: 0 (0 B) 
Cache Used%: 100.00% 
Cache Remaining%: 0.00% 
Last contact: Mon Dec 14 06:56:11 UTC 2015 

但是在某些時候,Yarn嘗試訪問它。這是我收到的日誌:

yarn logs -applicationId application_1450050523156_0009 

http://pastebin.com/UVHnkRRp

Service org.apache.tez.dag.app.rm.TaskScheduler failed in state STARTED; cause: java.lang.IllegalArgumentException: java.net.UnknownHostException: anmol-vm4-new 
     at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:377) 
     at org.apache.hadoop.yarn.server.security.BaseNMTokenSecretManager.newInstance(BaseNMTokenSecretManager.java:145) 
     at org.apache.hadoop.yarn.server.security.BaseNMTokenSecretManager.createNMToken(BaseNMTokenSecretManager.java:136) 
     at org.apache.hadoop.yarn.server.resourcemanager.security.NMTokenSecretManagerInRM.createAndGetOptimisticNMToken(NMTokenSecretManagerInRM.java:325) 
     at org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService.registerApplicationMaster(ApplicationMasterService.java:297) 
     at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationMasterProtocolPBServiceImpl.registerApplicationMaster(ApplicationMasterProtocolPBServiceImpl.java:90) 
     at org.apache.hadoop.yarn.proto.ApplicationMasterProtocol$ApplicationMasterProtocolService$2.callBlockingMethod(ApplicationMasterProtocol.java:95) 
     at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) 
     at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) 
     at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2014) 
     at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2010) 
     at java.security.AccessController.doPrivileged(Native Method) 
     at javax.security.auth.Subject.doAs(Subject.java:415) 
     at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1561) 
     at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2008) 
Caused by: java.net.UnknownHostException: anmol-vm4-new 
     ... 15 more 

任何想法,爲什麼它試圖訪問VM4這不是在奴隸名單,以及如何可以解決嗎?

更新: 我做了以下內容,而是因爲它試圖訪問vm4還是我收到一個錯誤:

1)conf目錄yarnpp包括VM4的私有IP地址添加文件excludemapred.exclude

2)在mapred-site.xml補充一點:

<property> 
    <name>mapred.hosts.exclude</name> 
    <value>/home/hadoop/yarnpp/conf/mapred.exclude</value> 
    <description>Names a file that contains the list of hosts that 
     should be excluded by the jobtracker. If the value is empty, no 
     hosts are excluded.</description> 
    </property> 

3)將其添加到hdfs-site.xml

<property> 
<name>dfs.hosts.exclude</name> 
<value>/home/hadoop/yarnpp/conf/exclude</value> 
<final>true</final> 
</property> 

3.5)將此添加到yarn-site.xml

<property> 
    <name>yarn.resourcemanager.nodes.exclude-path</name> 
    <value>/home/hadoop/yarnpp/conf/exclude</value> 
    <description>Path to file with nodes to exclude.</description> 
    </property> 

4)運行cp_host .sh複製conf目錄到所有的奴隸!

5)運行reboot_everything腳本(這確實stop-all.sh,格式化和start-all.sh

6)hadoop dfsadmin -refreshNodes

7)在主虛擬機上運行此命令:

yarn rmadmin -refreshNodes 

而這裏的新日誌:http://pastebin.com/cKPY9gmB

即使vm4不在VM列表中,它也不會生病在這裏,表示: enter image description here

而現在所有這些更新,當我運行gridmix-generate.sh工作,我得到這個錯誤:

15/12/14 10:14:53 INFO ipc.Client: Retrying connect to server: anmol-vm3-new/10.0.1.192:50833. Already tried 8 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 
+1

你是如何將它從從節點列表中刪除的? –

+1

從slave文件中刪除ip後,您是否重新啓動了資源管理器? –

+0

@ManjunathBallur我剛剛在conf目錄中有一個slaves文件加上我將它從/ etc/hosts中刪除了 –

回答

1

說話蒙娜麗莎的聊天后,問題已經得到解決。

當有人運行stop-all.sh命令時,有些時候所有的進程可能不會停止。運行ps -ef命令以確保所有節點上的所有進程都已停止,這是很好的做法。 Monal運行了命令stop-all.sh並運行了命令ps -ef|grep -i datanode,該命令仍顯示結果。

然後在聊天中,我要求她重新啓動所有將清除懸掛進程的虛擬機。硬重啓已經解決了這個問題。

+0

網絡提供商執行硬重啓。簡單地做停止和開始都沒有解決問題 –