2016-04-19 37 views
0

我在IBM Bluemix中使用Spark服務。我正在嘗試使用spark-submit.sh腳本啓動一段Java代碼來執行一些Spark進程。Bluemix Spark:下載stderr和stdout時spark-submit失敗?

我的命令行是:

./spark-submit.sh --vcap ./VCAP.json --deploy-mode cluster --class org.apache.spark.examples.JavaSparkPi \ 
--master https://169.54.219.20 ~/Documents/Spark/JavaSparkPi.jar 

我使用的是最新版本spark-submit.sh(截至昨天)。

./spark-submit.sh --version 
spark-submit.sh VERSION : '1.0.0.0.20160330.1' 

這工作得很好了幾個星期前(與老spark-submit.sh),但現在我收到以下錯誤:

Downloading stdout_1461024849908170118 
    % Total % Received % Xferd Average Speed Time Time  Time Current 
           Dload Upload Total Spent Left Speed 
    0 89 0 89 0  0  56  0 --:--:-- 0:00:01 --:--:-- 108 
Failed to download from workdir/driver-20160418191414-0020-5e7fb175-6856-4980-97bc-8e8aa0d1f137/stdout to  stdout_1461024849908170118 

Downloading stderr_1461024849908170118 
    % Total % Received % Xferd Average Speed Time Time  Time Current 
           Dload Upload Total Spent Left Speed 
    0 89 0 89 0  0  50  0 --:--:-- 0:00:01 --:--:-- 108 
Failed to download from workdir/driver-20160418191414-0020-5e7fb175-6856-4980-97bc-8e8aa0d1f137/stderr to  stderr_1461024849908170118 

什麼我做錯了任何想法?提前致謝。

編輯:

通過查看日誌文件,我發現,這個問題是不是真的在下載輸出和錯誤,但在提交作業時。

{ 
    "action" : "SubmissionStatusResponse", 
    "driverState" : "FAILED", 
    "message" : "Exception from the cluster: 
org.apache.spark.SparkException: Failed to change container CWD 
org.apache.spark.deploy.master.EgoApplicationManager.egoDriverExitCallback(EgoApplicationManager.scala:168) 
org.apache.spark.deploy.master.MasterScheduleDelegatorDriver.onContainerExit(MasterScheduleDelegatorDriver.scala:144) 
org.apache.spark.deploy.master.resourcemanager.ResourceManagerEGOSlot.handleActivityFinish(ResourceManagerEGOSlot.scala:555) 
org.apache.spark.deploy.master.resourcemanager.ResourceManagerEGOSlot.callbackContainerStateChg(ResourceManagerEGOSlot.scala:525) 
org.apache.spark.deploy.master.resourcemanager.ResourceCallbackManager$$anonfun$callbackContainerStateChg$1.apply(ResourceManager.scala:158) 
org.apache.spark.deploy.master.resourcemanager.ResourceCallbackManager$$anonfun$callbackContainerStateChg$1.apply(ResourceManager.scala:157) 
scala.Option.foreach(Option.scala:236) 
org.apache.spark.deploy.master.resourcemanager.ResourceCallbackManager$.callbackContainerStateChg(ResourceManager.scala:157)", 
    "serverSparkVersion" : "1.6.0", 
    "submissionId" : "driver-20160420043532-0027-6e579720-2c9d-428f-b2c7-6613f4845146", 
    "success" : true 
} 
driverStatus is FAILED 

EDIT2:

最後提交時,作業已被創建星火服務的一個全新的實例只是解決了我的問題。我的工作現在執行並在幾秒鐘後結束。

但是我在嘗試下載stdout和stderr文件時仍然收到錯誤消息。

Downloading stdout_1461156506108609180 
% Total % Received % Xferd Average Speed Time Time  Time Current 
Dload Upload Total Spent Left Speed 
    0 90 0 90 0  0  61  0 --:--:-- 0:00:01 --:--:-- 125 
Failed to download from workdir2/driver-20160420074922-0008-1400fc20-95c1-442d-9c37-32de3a7d1f0a/stdout to stdout_1461156506108609180 

Downloading stderr_1461156506108609180 
% Total % Received % Xferd Average Speed Time Time  Time Current 
Dload Upload Total Spent Left Speed 
    0 90 0 90 0  0  56  0 --:--:-- 0:00:01 --:--:-- 109 
Failed to download from workdir2/driver-20160420074922-0008-1400fc20-95c1-442d-9c37-32de3a7d1f0a/stderr to stderr_1461156506108609180 

任何想法?

回答

0

我發現,老火花提交了正在嘗試從Workdir文件夾輸出和錯誤...

Failed to download from workdir/driver-20160418191414-0020-5e7fb175-6856-4980-97bc-8e8aa0d1f137/stdout to  stdout_1461024849908170118 

雖然新(昨日下載)火花提交試圖從workdir2文件夾中下載它們...

Failed to download from workdir2/driver-20160420074922-0008-1400fc20-95c1-442d-9c37-32de3a7d1f0a/stdout to stdout_1461156506108609180 

在使用該文件夾是由在初始化的變量SS_SPARK_WORK_DIR固定火花提交

if [ -z ${SS_SPARK_WORK_DIR} ]; then SS_SPARK_WORK_DIR="workdir2"; fi # Work directory on spark cluster 

我將值更改爲workdir,現在一切正常。我已經從Bluemix站點下載了一個新的(今天的)spark-submit,並且此問題已得到修復。現在該變量指向workdir。

因此,如果有任何失敗,請確保您從Bluemix獲得了最後一個spark-submit腳本。