Hive執行「insert into ... values ...」很慢

我構建了一個hadoop & hive集羣並嘗試做一些測試。但它真的很慢。Hive執行「insert into ... values ...」很慢

表

表VALUE_COUNT

+--------------------------------------------------------------+--+ 
|      createtab_stmt      | 
+--------------------------------------------------------------+--+ 
| CREATE TABLE `value_count`(         | 
| `key` int,             | 
| `count` int,            | 
| `create_date` date COMMENT '????')       | 
| COMMENT 'This is a group table'        | 
| ROW FORMAT SERDE            | 
| 'org.apache.hadoop.hive.ql.io.orc.OrcSerde'    | 
| STORED AS INPUTFORMAT          | 
| 'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'   | 
| OUTPUTFORMAT             | 
| 'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'   | 
| LOCATION              | 
| 'hdfs://avatarcluster/hive/warehouse/test.db/value_count' | 
| TBLPROPERTIES (            | 
| 'COLUMN_STATS_ACCURATE'='{\"BASIC_STATS\":\"true\"}',  | 
| 'numFiles'='7',           | 
| 'numRows'='7',            | 
| 'rawDataSize'='448',          | 
| 'totalSize'='2297',          | 
| 'transient_lastDdlTime'='1496217645')      | 
+--------------------------------------------------------------+--+

SQL執行

insert into value_count values (5, 1, '2017-05-06');

我已經執行此SQL數次，每次大約需要4或5分鐘

hadoop的容器日誌

2017-05-31 16:00:45,041 [INFO] [Dispatcher thread {Central}] |app.DAGAppMaster|: Central Dispatcher queue size after DAG completion, before cleanup: 0 
2017-05-31 16:00:45,041 [INFO] [Dispatcher thread {Central}] |app.DAGAppMaster|: Waiting for next DAG to be submitted. 
2017-05-31 16:00:45,042 [INFO] [Dispatcher thread {Central}] |app.DAGAppMaster|: Cleaning up DAG: name=insert into value_count valu...'2017-05-06')(Stage-1), with id=dag_1490688643958_53401_1 
2017-05-31 16:00:45,042 [INFO] [Dispatcher thread {Central}] |container.AMContainerMap|: Cleaned up completed containers on dagComplete. Removed=0, Remaining=1 
2017-05-31 16:00:45,044 [INFO] [Dispatcher thread {Central}] |app.DAGAppMaster|: Completed cleanup for DAG: name=insert into value_count valu...'2017-05-06')(Stage-1), with id=dag_1490688643958_53401_1 
2017-05-31 16:00:50,749 [INFO] [DelayedContainerManager] |rm.YarnTaskSchedulerService|: No taskRequests. Container's idle timeout delay expired or is new. Releasing container, containerId=container_1490688643958_53401_01_000002, containerExpiryTime=1496217650577, idleTimeout=5000, taskRequestsCount=0, heldContainers=1, delayedContainers=0, isNew=false 
2017-05-31 16:00:50,752 [INFO] [Dispatcher thread {Central}] |history.HistoryEventHandler|: [HISTORY][DAG:dag_1490688643958_53401_1][Event:CONTAINER_STOPPED]: containerId=container_1490688643958_53401_01_000002, stoppedTime=1496217650751, exitStatus=0 
2017-05-31 16:00:50,753 [INFO] [ContainerLauncher #1] |launcher.TezContainerLauncherImpl|: Stopping container_1490688643958_53401_01_000002 
2017-05-31 16:00:50,753 [INFO] [ContainerLauncher #1] |impl.ContainerManagementProtocolProxy|: Opening proxy : app08.hp.sp.tst.bmsre.com:51640 
2017-05-31 16:00:51,628 [INFO] [Dispatcher thread {Central}] |container.AMContainerImpl|: Container container_1490688643958_53401_01_000002 exited with diagnostics set to Container failed, exitCode=-105. Container killed by the ApplicationMaster. 
Container killed on request. Exit code is 143 
Container exited with a non-zero exit code 143 

2017-05-31 16:01:29,678 [INFO] [AMRM Callback Handler Thread] |rm.YarnTaskSchedulerService|: Allocated: <memory:0, vCores:0> Free: <memory:6144, vCores:1> pendingRequests: 0 delayedContainers: 0 heartbeats: 51 lastPreemptionHeartbeat: 50 
2017-05-31 16:02:19,740 [INFO] [AMRM Callback Handler Thread] |rm.YarnTaskSchedulerService|: Allocated: <memory:0, vCores:0> Free: <memory:6144, vCores:1> pendingRequests: 0 delayedContainers: 0 heartbeats: 101 lastPreemptionHeartbeat: 100 
2017-05-31 16:03:09,801 [INFO] [AMRM Callback Handler Thread] |rm.YarnTaskSchedulerService|: Allocated: <memory:0, vCores:0> Free: <memory:6144, vCores:1> pendingRequests: 0 delayedContainers: 0 heartbeats: 151 lastPreemptionHeartbeat: 150 
2017-05-31 16:03:59,858 [INFO] [AMRM Callback Handler Thread] |rm.YarnTaskSchedulerService|: Allocated: <memory:0, vCores:0> Free: <memory:6144, vCores:1> pendingRequests: 0 delayedContainers: 0 heartbeats: 201 lastPreemptionHeartbeat: 200 
2017-05-31 16:04:49,915 [INFO] [AMRM Callback Handler Thread] |rm.YarnTaskSchedulerService|: Allocated: <memory:0, vCores:0> Free: <memory:6144, vCores:1> pendingRequests: 0 delayedContainers: 0 heartbeats: 251 lastPreemptionHeartbeat: 250 
2017-05-31 16:05:39,971 [INFO] [AMRM Callback Handler Thread] |rm.YarnTaskSchedulerService|: Allocated: <memory:0, vCores:0> Free: <memory:6144, vCores:1> pendingRequests: 0 delayedContainers: 0 heartbeats: 301 lastPreemptionHeartbeat: 300 
2017-05-31 16:06:09,581 [INFO] [DAGSubmissionTimer] |rm.TaskSchedulerManager|: TaskScheduler notified that it should unregister from RM 
2017-05-31 16:06:09,581 [INFO] [DAGSubmissionTimer] |app.DAGAppMaster|: No current running DAG, shutting down the AM 
2017-05-31 16:06:09,581 [INFO] [DAGSubmissionTimer] |app.DAGAppMaster|: DAGAppMasterShutdownHandler invoked 
2017-05-31 16:06:09,581 [INFO] [DAGSubmissionTimer] |app.DAGAppMaster|: Handling DAGAppMaster shutdown 
2017-05-31 16:06:09,582 [INFO] [AMShutdownThread] |app.DAGAppMaster|: Sleeping for 5 seconds before shutting down 
2017-05-31 16:06:14,582 [INFO] [AMShutdownThread] |app.DAGAppMaster|: Calling stop for all the services 
2017-05-31 16:06:14,582 [INFO] [AMShutdownThread] |rm.YarnTaskSchedulerService|: Initiating stop of YarnTaskScheduler 
2017-05-31 16:06:14,582 [INFO] [AMShutdownThread] |rm.YarnTaskSchedulerService|: Releasing held containers 
2017-05-31 16:06:14,583 [INFO] [AMShutdownThread] |rm.YarnTaskSchedulerService|: Removing all pending taskRequests 
2017-05-31 16:06:14,583 [INFO] [AMShutdownThread] |history.HistoryEventHandler|: Stopping HistoryEventHandler 
2017-05-31 16:06:14,583 [INFO] [AMShutdownThread] |recovery.RecoveryService|: Stopping RecoveryService 
2017-05-31 16:06:14,583 [INFO] [AMShutdownThread] |recovery.RecoveryService|: Handle the remaining events in queue, queue size=0 
2017-05-31 16:06:14,584 [INFO] [RecoveryEventHandlingThread] |recovery.RecoveryService|: EventQueue take interrupted. Returning 
2017-05-31 16:06:14,584 [INFO] [AMShutdownThread] |recovery.RecoveryService|: Closing Summary Stream 
2017-05-31 16:06:14,611 [INFO] [AMShutdownThread] |impl.SimpleHistoryLoggingService|: Stopping SimpleHistoryLoggingService, eventQueueBacklog=0 
2017-05-31 16:06:14,611 [INFO] [HistoryEventHandlingThread] |impl.SimpleHistoryLoggingService|: EventQueue take interrupted. Returning 
2017-05-31 16:06:14,613 [INFO] [DelayedContainerManager] |rm.YarnTaskSchedulerService|: AllocatedContainerManager Thread interrupted 
2017-05-31 16:06:14,615 [INFO] [AMShutdownThread] |rm.YarnTaskSchedulerService|: Unregistering application from RM, exitStatus=SUCCEEDED, exitMessage=Session stats:submittedDAGs=0, successfulDAGs=1, failedDAGs=0, killedDAGs=0 
, trackingURL= 
2017-05-31 16:06:14,620 [INFO] [AMShutdownThread] |impl.AMRMClientImpl|: Waiting for application to be successfully unregistered. 
2017-05-31 16:06:14,720 [INFO] [AMShutdownThread] |rm.YarnTaskSchedulerService|: Successfully unregistered application from RM 
2017-05-31 16:06:14,721 [INFO] [AMShutdownThread] |rm.TaskSchedulerManager|: Shutting down AppCallbackExecutor 
2017-05-31 16:06:14,721 [INFO] [AMRM Callback Handler Thread] |impl.AMRMClientAsyncImpl|: Interrupted while waiting for queue 
java.lang.InterruptedException 
    at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2014) 
    at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2048) 
    at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) 
    at org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl$CallbackHandlerThread.run(AMRMClientAsyncImpl.java:274) 
2017-05-31 16:06:14,726 [INFO] [AMShutdownThread] |mortbay.log|: Stopped [email protected]:0 
2017-05-31 16:06:14,826 [INFO] [AMShutdownThread] |ipc.Server|: Stopping server on 49703 
2017-05-31 16:06:14,827 [INFO] [IPC Server listener on 49703] |ipc.Server|: Stopping IPC Server listener on 49703 
2017-05-31 16:06:14,827 [INFO] [AMShutdownThread] |ipc.Server|: Stopping server on 43709 
2017-05-31 16:06:14,827 [INFO] [IPC Server Responder] |ipc.Server|: Stopping IPC Server Responder 
2017-05-31 16:06:14,827 [INFO] [IPC Server listener on 43709] |ipc.Server|: Stopping IPC Server listener on 43709 
2017-05-31 16:06:14,827 [INFO] [IPC Server Responder] |ipc.Server|: Stopping IPC Server Responder 
2017-05-31 16:06:14,830 [INFO] [Thread-2] |app.DAGAppMaster|: DAGAppMasterShutdownHook invoked 
2017-05-31 16:06:14,830 [INFO] [Thread-2] |app.DAGAppMaster|: The shutdown handler is still running, waiting for it to complete 
2017-05-31 16:06:14,844 [INFO] [AMShutdownThread] |app.DAGAppMaster|: Completed deletion of tez scratch data dir, path=hdfs://avatarcluster/tmp/hive/hadoop/_tez_session_dir/46c45420-9bdf-40a5-83a5-c8d1d496abb8/.tez/application_1490688643958_53401 
2017-05-31 16:06:14,844 [INFO] [AMShutdownThread] |app.DAGAppMaster|: Exiting DAGAppMaster..GoodBye! 
2017-05-31 16:06:14,844 [INFO] [Thread-2] |app.DAGAppMaster|: The shutdown handler has completed

Hadoop集羣架構

app05: NameNode/ResourceManager/JournalNode 
app10: NameNode(Standby)/JournalNode 
app08: DataNode/NodeManager/JournalNode 
app09: DataNode/NodeManager

而app05/08/09/10是我的測試機，每個都具有32 VCORE和48GB的RAM。

Hadoop配置

芯-site.xml中

<?xml version="1.0" encoding="UTF-8"?> 
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?> 
<configuration> 
    <property> 
     <name>fs.defaultFS</name> 
     <value>hdfs://avatarcluster</value> 
    </property> 
    <property> 
     <name>io.file.buffer.size</name> 
     <value>131072</value> 
    </property> 
    <property> 
     <name>hadoop.proxyuser.hadoop.hosts</name> 
     <value>*</value> 
    </property> 
    <property> 
     <name>hadoop.proxyuser.hadoop.groups</name> 
     <value>*</value> 
    </property> 
    <property> 
     <name>hadoop.tmp.dir</name> 
     <value>/home/hadoop/hadoop-data/</value> 
    </property> 
    <property> 
     <name>ha.zookeeper.quorum</name> 
     <value>app02.hp.sp.tst.bmsre.com:2181</value> 
    </property> 
<property> 
    <name>io.compression.codecs</name> 
    <value>org.apache.hadoop.io.compress.SnappyCodec</value> 
</property> 
</configuration>

HDFS-site.xml中

<?xml version="1.0" encoding="UTF-8"?> 
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?> 
<configuration> 
    <property> 
     <name>dfs.nameservices</name> 
     <value>avatarcluster</value> 
    </property> 
    <property> 
     <name>dfs.ha.namenodes.avatarcluster</name> 
     <value>nn1,nn2</value> 
    </property> 
    <property> 
     <name>dfs.namenode.rpc-address.avatarcluster.nn1</name> 
     <value>app05.hp.sp.tst.bmsre.com:8020</value> 
    </property> 
    <property> 
     <name>dfs.namenode.rpc-address.avatarcluster.nn2</name> 
     <value>app10.hp.sp.tst.bmsre.com:8020</value> 
    </property> 
    <property> 
     <name>dfs.namenode.http-address.avatarcluster.nn1</name> 
     <value>app05.hp.sp.tst.bmsre.com:50070</value> 
    </property> 
    <property> 
     <name>dfs.namenode.http-address.avatarcluster.nn2</name> 
     <value>app10.hp.sp.tst.bmsre.com:50070</value> 
    </property> 
    <property> 
     <name>dfs.namenode.shared.edits.dir</name> 
     <value>qjournal://app05.hp.sp.tst.bmsre.com:8485;app10.hp.sp.tst.bmsre.com:8485;app08.hp.sp.tst.bmsre.com:8485/avatarcluster 
     </value> 
    </property> 
    <property> 
     <name>dfs.client.failover.proxy.provider.avatarcluster</name> 
     <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value> 
    </property> 
    <property> 
     <name>dfs.ha.fencing.methods</name> 
     <value>sshfence</value> 
    </property> 
    <property> 
     <name>dfs.ha.fencing.ssh.private-key-files</name> 
     <value>/home/hadoop/.ssh/id_rsa</value> 
    </property> 
    <property> 
     <name>dfs.journalnode.edits.dir</name> 
     <value>/home/hadoop/hadoop/journal-data</value> 
    </property> 
    <property> 
     <name>dfs.ha.automatic-failover.enabled</name> 
     <value>true</value> 
    </property> 
    <property> 
     <name>dfs.namenode.name.dir</name> 
     <value>/home/hadoop/hadoop/namenode</value> 
    </property> 
    <property> 
     <name>dfs.blocksize</name> 
     <value>134217728</value> 
    </property> 
    <property> 
     <name>dfs.namenode.handler.count</name> 
     <value>100</value> 
    </property> 
    <property> 
     <name>dfs.replication</name> 
     <value>2</value> 
    </property> 
</configuration>

maprd-site.xml中

<?xml version="1.0"?> 
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?> 
<configuration> 
    <property> 
     <name>mapreduce.framework.name</name> 
     <value>yarn</value> 
    </property> 
    <property> 
     <name>mapreduce.jobhistory.address</name> 
     <value>app05.hp.sp.tst.bmsre.com:10020</value> 
    </property> 
    <property> 
     <name>mapreduce.jobhistory.webapp.address</name> 
     <value>app05.hp.sp.tst.bmsre.com:19888</value> 
    </property> 
    <property> 
     <name>mapreduce.jobhistory.intermediate-done-dir</name> 
     <value>/mr-history/tmp</value> 
    </property> 
    <property> 
     <name>mapreduce.jobhistory.done-dir</name> 
     <value>/mr-history/done</value> 
    </property> 
    <property> 
     <name>mapred.output.compress</name> 
     <value>true</value> 
    </property> 
    <property> 
     <name>mapred.output.compression.codec</name> 
     <value>org.apache.hadoop.io.compress.SnappyCodec</value> 
    </property> 
    <property> 
     <name>mapred.compress.map.output</name> 
     <value>true</value> 
    </property> 
    <property> 
     <name>mapreduce.map.memory.mb</name> 
     <value>3048</value> 
    </property> 
    <property> 
     <name>mapreduce.reduce.memory.mb</name> 
     <value>3048</value> 
    </property> 
    <property> 
     <name>mapreduce.map.java.opts</name> 
     <value>-Xmx2024m</value> 
    </property> 
    <property> 
     <name>mapreduce.reduce.java.opts</name> 
     <value>-Xmx2024m</value> 
    </property> 
</configuration>

TEZ-site.xml中

<?xml version="1.0" encoding="UTF-8"?> 
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?> 

<configuration> 
    <property> 
     <name>tez.lib.uris</name> 
     <value>${fs.defaultFS}/apps/tez-0.8.5.tar.gz</value> 
    </property> 
    <property> 
     <name>tez.am.resource.memory.mb</name> 
     <value>2048</value> 
    </property> 
    <property> 
     <name>tez.task.resource.memory.mb</name> 
     <value>2048</value> 
    </property> 
</configuration>

紗的site.xml

<?xml version="1.0"?> 
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?> 
<configuration> 
    <property> 
     <name>yarn.admin.acl</name> 
     <value>*</value> 
    </property> 
    <property> 
     <name>yarn.resourcemanager.address</name> 
     <value>app05.hp.sp.tst.bmsre.com:8032</value> 
    </property> 
    <property> 
     <name>yarn.resourcemanager.scheduler.address</name> 
     <value>app05.hp.sp.tst.bmsre.com:8030</value> 
    </property> 
    <property> 
     <name>yarn.resourcemanager.resource-tracker.address</name> 
     <value>app05.hp.sp.tst.bmsre.com:8031</value> 
    </property> 
    <property> 
     <name>yarn.resourcemanager.admin.address</name> 
     <value>app05.hp.sp.tst.bmsre.com:8033</value> 
    </property> 
    <property> 
     <name>yarn.resourcemanager.webapp.address</name> 
     <value>app05.hp.sp.tst.bmsre.com:8088</value> 
    </property> 
    <property> 
     <name>yarn.nodemanager.local-dirs</name> 
     <value>/home/hadoop/hadoop/nodemanager-workdir</value> 
    </property> 
    <property> 
     <name>yarn.nodemanager.log-dirs</name> 
     <value>/home/hadoop/hadoop/nodemanager-logs</value> 
    </property> 
    <property> 
     <name>yarn.nodemanager.log.retain-seconds</name> 
     <value>3600</value> 
    </property> 
    <property> 
     <name>yarn.nodemanager.remote-app-log-dir</name> 
     <value>/home/hadoop/hadoop/nodemanager-remote-app-logs</value> 
    </property> 
    <property> 
     <name>yarn.nodemanager.remote-app-log-dir-suffix</name> 
     <value>logs</value> 
    </property> 
    <property> 
    <name>yarn.nodemanager.aux-services</name> 
    <value>mapreduce_shuffle</value> 
    </property> 
    <property> 
    <name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name> 
    <value>org.apache.hadoop.mapred.ShuffleHandler</value> 
    </property> 
    <property> 
     <name>yarn.nodemanager.resource.memory-mb</name> 
     <value>4096</value> 
    </property> 
    <property> 
     <name>yarn.scheduler.minimum-allocation-mb</name> 
     <value>1024</value> 
    </property> 
</configuration>

來源

2017-05-31 Alexis

根據你的日誌，你的紗線不MR1運行此。你能提供yarn-site.xml嗎？在調試模式下 – tk421

推出蜂巢'蜂巢--hiveconf hive.root.logger = DEBUG，console' –

@ tk421更新紗的site.xml – Alexis

我不知道你的羣集配置（節點，內存，CPU），這將是有用的，如果你可以更新發布該信息。同時，第一個猜測是你的Tez配置是錯誤的，所以我建議把MB減小到更小的值（對於微小的數據，幾個mbs應該足夠了）。此外，嘗試使用mapred引擎，以便限制問題

hive.execution.engine=mr

來源

2017-06-05 02:46:59 hlagos

蜂巢可以使用Apache TEZ執行引擎，而不是古老的地圖，減少發動機的運行相同的命令。我不會詳細討論在這裏提到的使用Tez的許多好處;相反，我想作一個簡單的建議：如果它不是在你的環境中默認打開的，使用TEZ設置爲「true」在蜂巢查詢開始執行以下操作：

set hive.execution.engine=tez;

來源

2017-06-09 07:10:01

在一般情況下，當您使用

INSERT INTO值

也容易產生小文件每次你執行一個語句。由於Hive沒有創建任何約束和索引，因此這些語句不斷添加小文件。如果ACID屬性表中啓用，它也試圖做壓實週期裏便嘗試所有小增量文件合併成一個大文件。這個過程有時可能很耗時。

因此最好是在本地使用創建一個文件

LOAD DATA LOCAL INPATH '/FILE/PATH' INTO TABLE TABLE_NAME ;

希望這有助於。

來源

2017-06-09 09:36:58

Hive執行「insert into ... values ...」很慢

回答

相關問題