我想要訪問配置單元拼花地板表並將其加載到熊貓數據框。我使用pyspark和我的代碼如下:Java堆空間問題
import pyspark
import pandas
from pyspark import SparkConf
from pyspark import SparkContext
from pyspark.sql import SQLContext
from pyspark.sql import HiveContext
conf = (SparkConf().set("spark.driver.maxResultSize", "10g").setAppName("buyclick").setMaster('yarn-client').set("spark.driver.memory", "4g").set("spark.driver.cores","4").set("spark.executor.memory", "4g").set("spark.executor.cores","4").set("spark.executor.extraJavaOptions","-XX:-UseCompressedOops"))
sc = SparkContext(conf=conf)
sqlContext = HiveContext(sc)
results = sqlContext.sql("select * from buy_click_p")
res_pdf = results.toPandas()
這已連續失敗什麼那麼我改變CONF參數,每次它失敗作爲Java堆問題:
Exception in thread "task-result-getter-2" java.lang.OutOfMemoryError: Java heap space
下面是其他一些關於環境的信息:
Cloudera CDH version : 5.9.0
Hive version : 1.1.0
Spark Version : 1.6.0
Hive table size : hadoop fs -du -s -h /path/to/hive/table/folder --> 381.6 M 763.2 M
Free memory on box : free -m
total used free shared buffers cached
Mem: 23545 11721 11824 12 258 1773