2013-05-12 50 views
0

我遇到一個奇怪的問題,我向你保證我的搜索引擎很多。AWS Elastic MapReduce下的慢速Hive查詢性能

我正在運行一組AWS Elastic MapReduce集羣,並且我有一個包含大約16個分區的Hive表。它們是由emr-s3distcp創建的(因爲原始s3存儲桶中有大約216K個文件),使用--groupBy並將限制設置爲64MiB(在這種情況下爲DFS塊大小),它們僅僅是文本文件每行使用JSON SerDe的json對象。

當我運行這個腳本時,它需要很長時間,然後由於某些IPC連接而放棄。

最初,從s3distcp到HDFS的壓力非常高,我採取了一些措施(請閱讀:調整大容量機器的大小,然後將dfs權限設置爲3倍複製,因爲它是一個小羣集,大小設置爲64MiB)。這是有效的,並且不足重複的塊的數量變爲零(EMR中的默認值小於3是2,但我已經更改爲3)。

看着/mnt/var/log/apps/hive_081.log產量seveeral線路是這樣的:

2013-05-12 09:56:12,120 DEBUG org.apache.hadoop.ipc.Client (Client.java:<init>(222)) - The ping interval is60000ms. 
2013-05-12 09:56:12,120 DEBUG org.apache.hadoop.ipc.Client (Client.java:<init>(265)) - Use SIMPLE authentication for protocol ClientProtocol 
2013-05-12 09:56:12,120 DEBUG org.apache.hadoop.ipc.Client (Client.java:setupIOstreams(551)) - Connecting to /10.17.17.243:9000 
2013-05-12 09:56:12,121 DEBUG org.apache.hadoop.ipc.Client (Client.java:sendParam(769)) - IPC Client (47) connection to /10.17.17.243:9000 from hadoop sending #14 
2013-05-12 09:56:12,121 DEBUG org.apache.hadoop.ipc.Client (Client.java:run(742)) - IPC Client (47) connection to /10.17.17.243:9000 from hadoop: starting, having connections 2 
2013-05-12 09:56:12,125 DEBUG org.apache.hadoop.ipc.Client (Client.java:receiveResponse(804)) - IPC Client (47) connection to /10.17.17.243:9000 from hadoop got value #14 
2013-05-12 09:56:12,126 DEBUG org.apache.hadoop.ipc.RPC (RPC.java:invoke(228)) - Call: getFileInfo 6 
2013-05-12 09:56:21,523 INFO org.apache.hadoop.ipc.Client (Client.java:handleConnectionFailure(663)) - Retrying connect to server: domU-12-31-39-10-81-2A.compute-1.internal/10.198.130.216:9000. Already tried 6 time(s). 
2013-05-12 09:56:22,122 DEBUG org.apache.hadoop.ipc.Client (Client.java:close(876)) - IPC Client (47) connection to /10.17.17.243:9000 from hadoop: closed 
2013-05-12 09:56:22,122 DEBUG org.apache.hadoop.ipc.Client (Client.java:run(752)) - IPC Client (47) connection to /10.17.17.243:9000 from hadoop: stopped, remaining connections 1 
2013-05-12 09:56:42,544 INFO org.apache.hadoop.ipc.Client (Client.java:handleConnectionFailure(663)) - Retrying connect to server: domU-12-31-39-10-81-2A.compute-1.internal/10.198.130.216:9000. Already tried 7 time(s). 

等等進行,直到客戶的一個擊中的限制。

在Elastic MapReduce下修復這個問題需要什麼?

感謝

回答

0

一段時間後,我發現:有問題的IP地址是不是即使在我的集羣,所以這是一個卡住蜂巢metastore。我已經通過以下方式解決了這個問題:

CREATE TABLE whatever_2 LIKE whatever LOCATION <hdfs_location>; 

ALTER TABLE whetever_2 RECOVER PARTITIONS; 

希望它有幫助。