我做了一個獨立的Apache集羣7個。要運行Scala代碼,代碼是火花數據表格集
/** Our main function where the action happens */
def main(args: Array[String]) {
// Set the log level to only print errors
Logger.getLogger("org").setLevel(Level.ERROR)
// Create a SparkContext without much actual configuration
// We want EMR's config defaults to be used.
val conf = new SparkConf()
conf.setAppName("MovieSimilarities1M")
val sc = new SparkContext(conf)
val input = sc.textFile("file:///home/ralfahad/LearnSpark/SBTCreate/customer-orders.csv")
val mappedInput = input.map(extractCustomerPricePairs)
val totalByCustomer = mappedInput.reduceByKey((x,y) => x + y)
val flipped = totalByCustomer.map(x => (x._2, x._1))
val totalByCustomerSorted = flipped.sortByKey()
val results = totalByCustomerSorted.collect()
// Print the results.
results.foreach(println)
}
}
步驟是:
我創建使用.jar文件SBT
使用提交作業火花提交* .jar
但是我的執行程序找不到sc.textFile("file:///home/ralfahad/LearnSpark/SBTCreate/customer-orders.csv")
此customer-orders.csv文件存儲在我的主PC中。
完整堆棧跟蹤:
error: [Stage 0:> (0 + 2)/2]17/09/25 17:32:35 ERROR TaskSetManager: Task 0 in stage 0.0 failed 4 times; aborting job Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 5, 141.225.166.191, executor 2): java.io.FileNotFoundException: File file:/home/ralfahad/LearnSpark/SBTCreate/customer-orders.csv does not exist
我怎麼解決這個問題呢?
請修改代碼以在羣集中運行。
錯誤:[階段0:>(0 + 2)/ 2] 17/09/25 17:32:35錯誤TaskSetManager:階段0.0中的任務0失敗4次;中止作業 線程「main」中的異常org.apache.spark.SparkException:由於階段失敗而導致作業中止:階段0中的任務0。0失敗4次,最近失敗:在階段0.0(TID 5,141.225.166.191,執行器2)中丟失任務0.3:java.io.FileNotFoundException:文件文件:/home/ralfahad/LearnSpark/SBTCreate/customer-orders.csv不存在 –