2017-05-29 44 views
0

我試圖在Hive中連接兩個ORC表,但出現錯誤。下面是該查詢:配置單元運行時錯誤:映射本地工作耗盡內存

select t1.num as num, t1.product as Product, t2.value as OldValue, t1.value as NewValue from test_new t1 LEFT OUTER JOIN test_old t2 ON t1.num=t2.num and t1.product=t2.product where t2.value is NULL and t1.value is not NULL or t1.value<>t2.value; 

錯誤:

2017-05-29 11:19:27,157 INFO [main]: mr.ExecDriver (SessionState.java:printInfo(911)) - Execution log at: /tmp/alex/kaliamoorthya_20170529111919_6621dd64-7a5e-4411-abda-b28fddab8bdc.log 
2017-05-29 11:19:27,320 INFO [main]: log.PerfLogger (PerfLogger.java:PerfLogBegin(118)) - <PERFLOG method=deserializePlan from=org.apache.hadoop.hive.ql.exec.Utilities> 
2017-05-29 11:19:27,321 INFO [main]: exec.Utilities (Utilities.java:deserializePlan(953)) - Deserializing MapredLocalWork via kryo 
2017-05-29 11:19:27,462 INFO [main]: log.PerfLogger (PerfLogger.java:PerfLogEnd(158)) - </PERFLOG method=deserializePlan start=1496056767320 end=1496056767462 duration=142 from=org.apache.hadoop.hive.ql.exec.Utilities> 
2017-05-29 11:19:27,472 INFO [main]: mr.MapredLocalTask (SessionState.java:printInfo(911)) - 2017-05-29 11:19:27 Starting to launch local task to process map join; maximum memory = 1908932608 
2017-05-29 11:19:27,549 INFO [main]: mr.MapredLocalTask (MapredLocalTask.java:initializeOperators(441)) - fetchoperator for t2 created 
2017-05-29 11:19:27,550 INFO [main]: exec.TableScanOperator (Operator.java:initialize(346)) - Initializing Self TS[0] 
2017-05-29 11:19:27,550 INFO [main]: exec.TableScanOperator (Operator.java:initializeChildren(419)) - Operator 0 TS initialized 
2017-05-29 11:19:27,550 INFO [main]: exec.TableScanOperator (Operator.java:initializeChildren(423)) - Initializing children of 0 TS 
2017-05-29 11:19:27,550 INFO [main]: exec.HashTableSinkOperator (Operator.java:initialize(458)) - Initializing child 1 HASHTABLESINK 
2017-05-29 11:19:27,550 INFO [main]: exec.HashTableSinkOperator (Operator.java:initialize(346)) - Initializing Self HASHTABLESINK[1] 
2017-05-29 11:19:27,551 INFO [main]: mapjoin.MapJoinMemoryExhaustionHandler (MapJoinMemoryExhaustionHandler.java:<init>(61)) - JVM Max Heap Size: 1908932608 
2017-05-29 11:19:27,582 INFO [main]: persistence.HashMapWrapper (HashMapWrapper.java:calculateTableSize(94)) - Key count from statistics is -1; setting map size to 100000 
2017-05-29 11:19:27,582 INFO [main]: exec.HashTableSinkOperator (Operator.java:initialize(394)) - Initialization Done 1 HASHTABLESINK 
2017-05-29 11:19:27,582 INFO [main]: exec.TableScanOperator (Operator.java:initialize(394)) - Initialization Done 0 TS 
2017-05-29 11:19:27,582 INFO [main]: mr.MapredLocalTask (MapredLocalTask.java:initializeOperators(461)) - fetchoperator for t2 initialized 
2017-05-29 11:19:28,059 INFO [main]: Configuration.deprecation (Configuration.java:warnOnceIfDeprecated(1174)) - mapred.input.dir is deprecated. Instead, use mapreduce.input.fileinputformat.inputdir 
2017-05-29 11:19:28,062 INFO [main]: log.PerfLogger (PerfLogger.java:PerfLogBegin(118)) - <PERFLOG method=OrcGetSplits from=org.apache.hadoop.hive.ql.io.orc.ReaderImpl> 
2017-05-29 11:19:28,098 INFO [main]: orc.OrcInputFormat (OrcInputFormat.java:generateSplitsInfo(961)) - FooterCacheHitRatio: 0/4 
2017-05-29 11:19:28,098 INFO [main]: log.PerfLogger (PerfLogger.java:PerfLogEnd(158)) - </PERFLOG method=OrcGetSplits start=1496056768062 end=1496056768098 duration=36 from=org.apache.hadoop.hive.ql.io.orc.ReaderImpl> 
2017-05-29 11:19:28,209 INFO [main]: orc.OrcRawRecordMerger (OrcRawRecordMerger.java:<init>(430)) - min key = null, max key = null 
2017-05-29 11:19:28,209 INFO [main]: orc.ReaderImpl (ReaderImpl.java:rowsOptions(526)) - Reading ORC rows from hdfs://nameservice1/user/hive/warehouse/alex_tmp.db/test_old/000000_0 with {include: [true, true, true, true], offset: 0, length: 9223372036854775807} 
2017-05-29 11:19:28,646 INFO [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:28 Processing rows: 200000 Hashtable size: 199999 Memory usage: 130784248 percentage: 0.069 
2017-05-29 11:19:28,708 INFO [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:28 Processing rows: 300000 Hashtable size: 299999 Memory usage: 159462144 percentage: 0.084 
2017-05-29 11:19:28,784 INFO [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:28 Processing rows: 400000 Hashtable size: 399999 Memory usage: 207258624 percentage: 0.109 
2017-05-29 11:19:28,843 INFO [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:28 Processing rows: 500000 Hashtable size: 499999 Memory usage: 235936520 percentage: 0.124 
2017-05-29 11:19:28,903 INFO [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:28 Processing rows: 600000 Hashtable size: 599999 Memory usage: 274173712 percentage: 0.144 
2017-05-29 11:19:28,965 INFO [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:28 Processing rows: 700000 Hashtable size: 699999 Memory usage: 312410896 percentage: 0.164 
2017-05-29 11:19:29,059 INFO [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:29 Processing rows: 800000 Hashtable size: 799999 Memory usage: 359036720 percentage: 0.188 
2017-05-29 11:19:29,126 INFO [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:29 Processing rows: 900000 Hashtable size: 899999 Memory usage: 397273912 percentage: 0.208 
2017-05-29 11:19:29,196 INFO [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:29 Processing rows: 1000000 Hashtable size: 999999 Memory usage: 425951800 percentage: 0.223 
2017-05-29 11:19:29,263 INFO [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:29 Processing rows: 1100000 Hashtable size: 1099999 Memory usage: 464188992 percentage: 0.243 
2017-05-29 11:19:29,333 INFO [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:29 Processing rows: 1200000 Hashtable size: 1199999 Memory usage: 502426176 percentage: 0.263 
2017-05-29 11:19:29,401 INFO [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:29 Processing rows: 1300000 Hashtable size: 1299999 Memory usage: 540663360 percentage: 0.283 
2017-05-29 11:19:32,752 INFO [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:32 Processing rows: 1400000 Hashtable size: 1399999 Memory usage: 485809696 percentage: 0.254 
2017-05-29 11:19:32,817 INFO [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:32 Processing rows: 1500000 Hashtable size: 1499999 Memory usage: 524582216 percentage: 0.275 
2017-05-29 11:19:32,937 INFO [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:32 Processing rows: 1600000 Hashtable size: 1599999 Memory usage: 580131976 percentage: 0.304 
2017-05-29 11:19:32,998 INFO [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:32 Processing rows: 1700000 Hashtable size: 1699999 Memory usage: 618904496 percentage: 0.324 
2017-05-29 11:19:33,061 INFO [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:33 Processing rows: 1800000 Hashtable size: 1799999 Memory usage: 647983888 percentage: 0.339 
2017-05-29 11:19:33,124 INFO [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:33 Processing rows: 1900000 Hashtable size: 1899999 Memory usage: 686756400 percentage: 0.36 
2017-05-29 11:19:33,188 INFO [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:33 Processing rows: 2000000 Hashtable size: 1999999 Memory usage: 725528920 percentage: 0.38 
2017-05-29 11:19:33,253 INFO [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:33 Processing rows: 2100000 Hashtable size: 2099999 Memory usage: 764301440 percentage: 0.40 
2017-05-29 11:19:33,316 INFO [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:33 Processing rows: 2200000 Hashtable size: 2199999 Memory usage: 793380824 percentage: 0.416 
2017-05-29 11:19:33,380 INFO [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:33 Processing rows: 2300000 Hashtable size: 2299999 Memory usage: 832153336 percentage: 0.436 
2017-05-29 11:19:33,445 INFO [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:33 Processing rows: 2400000 Hashtable size: 2399999 Memory usage: 870925856 percentage: 0.456 
2017-05-29 11:19:33,510 INFO [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:33 Processing rows: 2500000 Hashtable size: 2499999 Memory usage: 909698376 percentage: 0.477 
2017-05-29 11:19:33,574 INFO [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:33 Processing rows: 2600000 Hashtable size: 2599999 Memory usage: 938777776 percentage: 0.492 
2017-05-29 11:19:38,930 INFO [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:38 Processing rows: 2700000 Hashtable size: 2699999 Memory usage: 924140056 percentage: 0.484 
2017-05-29 11:19:38,996 INFO [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:38 Processing rows: 2800000 Hashtable size: 2799999 Memory usage: 960610440 percentage: 0.503 
2017-05-29 11:19:39,063 INFO [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:39 Processing rows: 2900000 Hashtable size: 2899999 Memory usage: 997080808 percentage: 0.522 
2017-05-29 11:19:39,134 INFO [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:39 Processing rows: 3000000 Hashtable size: 2999999 Memory usage: 1033551200 percentage: 0.541 
2017-05-29 11:19:39,203 INFO [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:39 Processing rows: 3100000 Hashtable size: 3099999 Memory usage: 1070021576 percentage: 0.561 
2017-05-29 11:19:39,392 INFO [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:39 Processing rows: 3200000 Hashtable size: 3199999 Memory usage: 1140046400 percentage: 0.597 
2017-05-29 11:19:39,456 INFO [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:39 Processing rows: 3300000 Hashtable size: 3299999 Memory usage: 1176516784 percentage: 0.616 
2017-05-29 11:19:39,519 INFO [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:39 Processing rows: 3400000 Hashtable size: 3399999 Memory usage: 1212987168 percentage: 0.635 
2017-05-29 11:19:39,583 INFO [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:39 Processing rows: 3500000 Hashtable size: 3499999 Memory usage: 1249457552 percentage: 0.655 
2017-05-29 11:19:39,646 INFO [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:39 Processing rows: 3600000 Hashtable size: 3599999 Memory usage: 1285927936 percentage: 0.674 
2017-05-29 11:19:39,710 INFO [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:39 Processing rows: 3700000 Hashtable size: 3699999 Memory usage: 1322398320 percentage: 0.693 
2017-05-29 11:19:39,774 INFO [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:39 Processing rows: 3800000 Hashtable size: 3799999 Memory usage: 1358868704 percentage: 0.712 
2017-05-29 11:19:39,837 INFO [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:39 Processing rows: 3900000 Hashtable size: 3899999 Memory usage: 1395339088 percentage: 0.731 
2017-05-29 11:19:39,904 INFO [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:39 Processing rows: 4000000 Hashtable size: 3999999 Memory usage: 1431809456 percentage: 0.75 
2017-05-29 11:19:39,973 INFO [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:39 Processing rows: 4100000 Hashtable size: 4099999 Memory usage: 1468279832 percentage: 0.769 
2017-05-29 11:19:40,041 INFO [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:40 Processing rows: 4200000 Hashtable size: 4199999 Memory usage: 1504750200 percentage: 0.788 
2017-05-29 11:19:40,113 INFO [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:40 Processing rows: 4300000 Hashtable size: 4299999 Memory usage: 1538933512 percentage: 0.806 
2017-05-29 11:19:48,786 INFO [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:48 Processing rows: 4400000 Hashtable size: 4399999 Memory usage: 1496365384 percentage: 0.784 
2017-05-29 11:19:48,850 INFO [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:48 Processing rows: 4500000 Hashtable size: 4499999 Memory usage: 1532580448 percentage: 0.803 
2017-05-29 11:19:48,915 INFO [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:48 Processing rows: 4600000 Hashtable size: 4599999 Memory usage: 1568795512 percentage: 0.822 
2017-05-29 11:19:48,979 INFO [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:48 Processing rows: 4700000 Hashtable size: 4699999 Memory usage: 1605010584 percentage: 0.841 
2017-05-29 11:19:49,044 INFO [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:49 Processing rows: 4800000 Hashtable size: 4799999 Memory usage: 1641225648 percentage: 0.86 
2017-05-29 11:19:49,108 INFO [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:49 Processing rows: 4900000 Hashtable size: 4899999 Memory usage: 1677440712 percentage: 0.879 
2017-05-29 11:19:49,171 INFO [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:49 Processing rows: 5000000 Hashtable size: 4999999 Memory usage: 1713655784 percentage: 0.898 
2017-05-29 11:19:49,235 INFO [main]: exec.HashTableSinkOperator (SessionState.java:printInfo(911)) - 2017-05-29 11:19:49 Processing rows: 5100000 Hashtable size: 5099999 Memory usage: 1749870856 percentage: 0.917 
2017-05-29 11:19:49,246 ERROR [main]: mr.MapredLocalTask (MapredLocalTask.java:executeInProcess(354)) - Hive Runtime Error: Map local work exhausted memory 
org.apache.hadoop.hive.ql.exec.mapjoin.MapJoinMemoryExhaustionException: 2017-05-29 11:19:49 Processing rows: 5100000 Hashtable size: 5099999 Memory usage: 1749870856 percentage: 0.917 
    at org.apache.hadoop.hive.ql.exec.mapjoin.MapJoinMemoryExhaustionHandler.checkMemoryStatus(MapJoinMemoryExhaustionHandler.java:99) 
    at org.apache.hadoop.hive.ql.exec.HashTableSinkOperator.processOp(HashTableSinkOperator.java:249) 
    at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815) 
    at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:95) 
    at org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask.startForward(MapredLocalTask.java:409) 
    at org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask.startForward(MapredLocalTask.java:380) 
    at org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask.executeInProcess(MapredLocalTask.java:346) 
    at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.main(ExecDriver.java:743) 
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) 
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 
    at java.lang.reflect.Method.invoke(Method.java:606) 
    at org.apache.hadoop.util.RunJar.run(RunJar.java:221) 
    at org.apache.hadoop.util.RunJar.main(RunJar.java:136) 

我試圖設置地圖內存並減少內存22000也仍然沒有運氣。 在搜索互聯網後,我發現有人建議在配置單元中設置hive.auto.convert.join = false屬性以克服上述錯誤,並且我的查詢開始運行。

我不確定以這種方式運行我的查詢會獲得任何性能。表演還會一樣嗎?我們有其他解決方案來解決問題嗎?請提出一些關於提高查詢性能的建議。

+0

開始看起來查詢邏輯是錯誤的。 給定來自't1'的記錄,其中來自't2'的所有匹配具有與其'value'相同的'value',記錄將被過濾掉。 –

+0

如果't2.value爲NULL',那麼它不能等於或不等於某事:'或t1.value <> t2.value' – leftjoin

回答

0

您的第一個也是最安全的選項是設置hive.auto.convert.join = false。這種方式會降低某些性能,因爲您不會從mapjoin中受益。但它完全取決於你的用例和你的數據大小,這個妥協有多大。 另一種選擇是使用hive.auto.convert.join.noconditionaltask.size選項,該選項根據https://cwiki.apache.org/confluence/display/Hive/LanguageManual+JoinOptimization「使用戶能夠控制表格在內存中的大小」找到合適的閾值可能會有挑戰性。

P.S.只要記住hive.auto.convert.join.noconditionaltask.size即可生效,hive.auto.convert.join.noconditionaltask需要爲true(默認情況下爲)。