我在新安裝的Ubuntu服務器上設置了一個新的獨立Apache Spark服務器。我嘗試在那裏發送我的第一份工作,但我不是很成功。Apache Spark上的遠程作業(Java)
這裏是我做本地:
SparkConf conf = new SparkConf().setAppName("myFirstJob").setMaster("local[*]");
JavaSparkContext javaSparkContext = new JavaSparkContext(conf);
javaSparkContext.setLogLevel("WARN");
SQLContext sqlContext = new SQLContext(javaSparkContext);
System.out.println("Hello, Remote Spark v." + javaSparkContext.version());
DataFrame df;
df = sqlContext.read().option("dateFormat", "yyyy-mm-dd")
.json("./src/main/resources/north-carolina-school-performance-data.json");
df = df.withColumn("district", df.col("fields.district"));
df = df.groupBy("district").count().orderBy(df.col("district"));
df.show(150);
它的工作原理:它的學校在區號碼顯示在NC學區的名稱:現在
Hello, Remote Spark v.1.6.1
+--------------------+-----+
| district|count|
+--------------------+-----+
|Alamance-Burlingt...| 34|
|Alexander County ...| 10|
|Alleghany County ...| 4|
|Anson County Schools| 10|
| Ashe County Schools| 5|
|Asheboro City Sch...| 8|
...
,如果我改的第一行:
SparkConf conf = new SparkConf().setAppName("myFirstJob").setMaster("spark://10.0.100.120:7077");
它順利到:
Hello, Remote Spark v.1.6.1
16/07/12 10:58:34 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
16/07/12 10:58:49 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
第一件奇怪的事情(對我來說)是服務器有Spark 1.6.2。我期待看到版本號爲1.6.2。
然後在UI,我去那裏看看:
如果我點擊應用,20160712105816-0011,我得到:
任何點擊其他鏈接將我帶到我的本地Apache Spark實例。
點擊左右後,我可以看到:
如果我看看日誌服務器上我看到:
16/07/12 10:37:00 INFO Master: Registered app myFirstJob with ID app-20160712103700-0009
16/07/12 10:37:03 INFO Master: Received unregister request from application app-20160712103700-0009
16/07/12 10:37:03 INFO Master: Removing app app-20160712103700-0009
16/07/12 10:37:03 INFO Master: 10.0.100.100:54396 got disassociated, removing it.
16/07/12 10:37:03 INFO Master: 10.0.100.100:54392 got disassociated, removing it.
16/07/12 10:50:44 INFO Master: Registering app myFirstJob
16/07/12 10:50:44 INFO Master: Registered app myFirstJob with ID app-20160712105044-0010
16/07/12 10:51:20 INFO Master: Received unregister request from application app-20160712105044-0010
16/07/12 10:51:20 INFO Master: Removing app app-20160712105044-0010
16/07/12 10:51:20 INFO Master: 10.0.100.100:54682 got disassociated, removing it.
16/07/12 10:51:20 INFO Master: 10.0.100.100:54680 got disassociated, removing it.
16/07/12 10:58:16 INFO Master: Registering app myFirstJob
16/07/12 10:58:16 INFO Master: Registered app myFirstJob with ID app-20160712105816-0011
這一切似乎好了我。 ..
我有一個前面的問題(未解決)Apache Spark Server installation requires Hadoop? Not automatically installed?,在相同的環境,但這是一個完全不同的 - 很多sma勒勒 - 應用程序。
發生了什麼事情的線索?
謝謝 - 這將是有道理的,我只需要了解如何創建一個配置,我有主服務器和從服務器。有可能是我在http://spark.apache.org/docs/1.6.2/spark-standalone.html中錯過的東西。 – jgp
如果您的主機和從機位於同一臺機器上,只需將機器IP地址放在'conf/slaves'中即可。 – Dikei
它的作品 - 我現在有另一個問題,但這部分工作 - TX!我認爲我是一個更現代化的奴隸制:你是一個主人,但是當沒有奴隸時,你自己做這個工作...... – jgp