2015-04-23 43 views

回答

3

不,它不會自動取消。

爲什麼?因爲可能你覺得RDD不再需要了,但是spark模型是在RDD需要進行轉換之前不能實現RDD,所以實際上很難說「我不需要這個RDD」了。即使是你的,它可以是非常棘手的,因爲以下情況:

JavaRDD<T> rddUnion = sc.parallelize(new ArrayList<T>()); // create empty for merging 
for (int i = 0; i < 10; i++) 
{ 
    JavaRDD<T2> rdd = sc.textFile(inputFileNames[i]); 
    rdd.cache(); // Since it will be used twice, cache. 
    rdd.map(...).filter(...).saveAsTextFile(outputFileNames[i]); // Transform and save, rdd materializes 
    rddUnion = rddUnion.union(rdd.map(...).filter(...)); // Do another transform to T and merge by union 
    rdd.unpersist(); // Now it seems not needed. (But is needed actually) 

// Here, rddUnion actually materializes, and needs all 10 rdds that already unpersisted. So, rebuilding all 10 rdds will occur. 
rddUnion.saveAsTextFile(mergedFileName); 
} 

信貸的代碼示例到spark-user ml

+0

嗨,@ C4stor感謝您的回答,但檢查https://開頭的github .com/apache/spark/pull/126和ContextCleaner.scala,似乎Spark做了一些自動清理RDD。所以不知道SPark如何以及何時決定不執行RDD是安全的。 –