2015-10-13 88 views
2

我在parquet文件(使用Spark創建)上運行Hive插入。 Hive插入使用分區的子句。但在結束的時候,屏幕印刷像「加載分區的消息{= XYZ,= 123 = ABC} Java堆空間異常來。Hive - 內存不足異常 - Java堆空間

java.lang.OutOfMemoryError: Java heap space 
     at java.util.HashMap.createEntry(HashMap.java:901) 
     at java.util.HashMap.addEntry(HashMap.java:888) 
     at java.util.HashMap.put(HashMap.java:509) 
     at org.apache.hadoop.hive.metastore.api.Partition.<init>(Partition.java:229) 
     at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.deepCopy(HiveMetaStoreClient.java:1356) 
     at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getPartitionWithAuthInfo(HiveMetaStoreClient.java:1003) 
     at sun.reflect.GeneratedMethodAccessor11.invoke(Unknown Source) 
     at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 
     at java.lang.reflect.Method.invoke(Method.java:606) 
     at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:89) 
     at com.sun.proxy.$Proxy9.getPartitionWithAuthInfo(Unknown Source) 
     at org.apache.hadoop.hive.ql.metadata.Hive.getPartition(Hive.java:1611) 
     at org.apache.hadoop.hive.ql.metadata.Hive.getPartition(Hive.java:1565) 
     at org.apache.hadoop.hive.ql.exec.StatsTask.getPartitionsList(StatsTask.java:403) 
     at org.apache.hadoop.hive.ql.exec.StatsTask.aggregateStats(StatsTask.java:150) 
     at org.apache.hadoop.hive.ql.exec.StatsTask.execute(StatsTask.java:117) 
     at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:153) 
     at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:85) 
     at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1508) 
     at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1275) 
     at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1093) 
     at org.apache.hadoop.hive.ql.Driver.run(Driver.java:916) 
     at org.apache.hadoop.hive.ql.Driver.run(Driver.java:906) 
     at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:268) 
     at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:220) 
     at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:423) 
     at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:359) 
     at org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:456) 
     at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:466) 
     at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:748) 
     at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:686) 
     at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625) 

我已運行工作,而下面的屬性,試圖將值更改爲更高和更低的,但最終每次我發現這個錯誤

屬性切換:。

set mapred.map.tasks=100; 
set mapred.reduce.tasks=100; 
set mapreduce.map.java.opts=-Xmx4096m; 
set mapreduce.reduce.java.opts=-Xmx4096m; 
set hive.exec.max.dynamic.partitions.pernode=100000; 
set hive.exec.max.dynamic.partitions=100000; 

請建議是怎麼回事錯在這裏蜂巢版本是0.13

hive-env.sh

if [ "$SERVICE" = "cli" ]; then 
    if [ -z "$DEBUG" ]; then 
    export HADOOP_OPTS="$HADOOP_OPTS -XX:NewRatio=12 -Xms12288m -XX:MaxHeapFreeRatio=40 -XX:MinHeapFreeRatio=15 -XX:+UseParNewGC -XX:-UseGCOverheadLimit" 
    else 
    export HADOOP_OPTS="$HADOOP_OPTS -XX:NewRatio=12 -Xms12288m -XX:MaxHeapFreeRatio=40 -XX:MinHeapFreeRatio=15 -XX:-UseGCOverheadLimit" 
    fi 
fi 

# The heap size of the jvm stared by hive shell script can be controlled via: 
# 
export HADOOP_HEAPSIZE=4096 
+1

問題在於創建動態分區的數量。目前查詢正在創建〜38000個分區。 爲了使這些工作正常工作,我們刪除了一個分區級別,這使得動態分區的數量達到〜1400個。因此完成了查詢。但關於內存的主要問題仍然存在。 – Kunal

回答

1

這可能與HIVE-10149問題。嘗試將​​設置爲true

+0

我有同樣的問題,這不起作用:( –