黑爾的一種方法。首先,設置了例如:
val prefix = "/home/tmp/date="
val dates = Array("20140901", "20140902", "20140903", "20140904")
val datesRDD = sc.parallelize(dates, 2)
荏苒的前綴中容易:
val datesWithPrefixRDD = datesRDD.map(s => prefix + s)
datesWithPrefixRDD.foreach(println)
這將產生:
/home/tmp/date=20140901
/home/tmp/date=20140903
/home/tmp/date=20140902
/home/tmp/date=20140904
但是,你問一個字符串。最明顯的第一次嘗試有一定的逗號問題:
val bad = datesWithPrefixRDD.fold("")((s1, s2) => s1 + ", " + s2)
println(bad)
這將產生:
, , /home/tmp/date=20140901, /home/tmp/date=20140902, , /home/tmp/date=20140903, /home/tmp/date=20140904
的問題是這樣的星火RDD的倍()方法啓動級聯用我提供的空字符串,曾經爲整個RDD和每個分區一次。但是,我們可以處理空字符串:
val good = datesWithPrefixRDD.fold("")((s1, s2) =>
s1 match {
case "" => s2
case s => s + ", " + s2
})
println(good)
然後我們得到:
/home/tmp/date=20140901, /home/tmp/date=20140902, /home/tmp/date=20140903, /home/tmp/date=20140904
編輯:其實,降低()產生一個整潔的答案,因爲它解決了「額外的逗號」的問題:
val alternative = datesWithPrefixRDD.reduce((s1, s2) => s1 + ", " + s2)
println(alternative)
我們再次得到:
/home/tmp/date=20140901, /home/tmp/date=20140902, /home/tmp/date=20140903, /home/tmp/date=20140904
它的工作原理,非常感謝! – 2014-09-27 21:11:32