我遇到一個奇怪的問題,我向你保證我的搜索引擎很多。AWS Elastic MapReduce下的慢速Hive查詢性能
我正在運行一組AWS Elastic MapReduce集羣,並且我有一個包含大約16個分區的Hive表。它們是由emr-s3distcp創建的(因爲原始s3存儲桶中有大約216K個文件),使用--groupBy並將限制設置爲64MiB(在這種情況下爲DFS塊大小),它們僅僅是文本文件每行使用JSON SerDe的json對象。
當我運行這個腳本時,它需要很長時間,然後由於某些IPC連接而放棄。
最初,從s3distcp到HDFS的壓力非常高,我採取了一些措施(請閱讀:調整大容量機器的大小,然後將dfs權限設置爲3倍複製,因爲它是一個小羣集,大小設置爲64MiB)。這是有效的,並且不足重複的塊的數量變爲零(EMR中的默認值小於3是2,但我已經更改爲3)。
看着/mnt/var/log/apps/hive_081.log產量seveeral線路是這樣的:
2013-05-12 09:56:12,120 DEBUG org.apache.hadoop.ipc.Client (Client.java:<init>(222)) - The ping interval is60000ms.
2013-05-12 09:56:12,120 DEBUG org.apache.hadoop.ipc.Client (Client.java:<init>(265)) - Use SIMPLE authentication for protocol ClientProtocol
2013-05-12 09:56:12,120 DEBUG org.apache.hadoop.ipc.Client (Client.java:setupIOstreams(551)) - Connecting to /10.17.17.243:9000
2013-05-12 09:56:12,121 DEBUG org.apache.hadoop.ipc.Client (Client.java:sendParam(769)) - IPC Client (47) connection to /10.17.17.243:9000 from hadoop sending #14
2013-05-12 09:56:12,121 DEBUG org.apache.hadoop.ipc.Client (Client.java:run(742)) - IPC Client (47) connection to /10.17.17.243:9000 from hadoop: starting, having connections 2
2013-05-12 09:56:12,125 DEBUG org.apache.hadoop.ipc.Client (Client.java:receiveResponse(804)) - IPC Client (47) connection to /10.17.17.243:9000 from hadoop got value #14
2013-05-12 09:56:12,126 DEBUG org.apache.hadoop.ipc.RPC (RPC.java:invoke(228)) - Call: getFileInfo 6
2013-05-12 09:56:21,523 INFO org.apache.hadoop.ipc.Client (Client.java:handleConnectionFailure(663)) - Retrying connect to server: domU-12-31-39-10-81-2A.compute-1.internal/10.198.130.216:9000. Already tried 6 time(s).
2013-05-12 09:56:22,122 DEBUG org.apache.hadoop.ipc.Client (Client.java:close(876)) - IPC Client (47) connection to /10.17.17.243:9000 from hadoop: closed
2013-05-12 09:56:22,122 DEBUG org.apache.hadoop.ipc.Client (Client.java:run(752)) - IPC Client (47) connection to /10.17.17.243:9000 from hadoop: stopped, remaining connections 1
2013-05-12 09:56:42,544 INFO org.apache.hadoop.ipc.Client (Client.java:handleConnectionFailure(663)) - Retrying connect to server: domU-12-31-39-10-81-2A.compute-1.internal/10.198.130.216:9000. Already tried 7 time(s).
等等進行,直到客戶的一個擊中的限制。
在Elastic MapReduce下修復這個問題需要什麼?
感謝