如何從下面的spark工作中刪除輸出中的括號「(」和「)」?如何刪除RDD [(String,Int)]上的saveAsTextFile時記錄周圍的括號?
當我嘗試使用PigScript讀取spark輸出時,它會產生一個問題。
我的代碼:
scala> val words = Array("HI","HOW","ARE")
words: Array[String] = Array(HI, HOW, ARE)
scala> val wordsRDD = sc.parallelize(words)
wordsRDD: org.apache.spark.rdd.RDD[String] = ParallelCollectionRDD[0] at parallelize at <console>:23
scala> val keyvalueRDD = wordsRDD.map(elem => (elem,1))
keyvalueRDD: org.apache.spark.rdd.RDD[(String, Int)] = MapPartitionsRDD[1] at map at <console>:25
scala> val wordcountRDD = keyvalueRDD.reduceByKey((x,y) => x+y)
wordcountRDD: org.apache.spark.rdd.RDD[(String, Int)] = ShuffledRDD[2] at reduceByKey at <console>:27
scala> wordcountRDD.saveAsTextFile("/user/cloudera/outputfiles")
輸出按照上面的代碼:
hadoop dfs -cat /user/cloudera/outputfiles/part*
(HOW,1)
(ARE,1)
(HI,1)
但我想火花的輸出將被存儲如下面作爲沒有括號
HOW,1
ARE,1
HI,1
現在我想用PigScript讀取上面的輸出。在Pigscript對待「(HOW」作爲第一個原子和「1)」作爲第二個原子
反正是有,我們可以擺脫掉火花代碼本身作爲括號我不想應用
LOAD語句修復該pigscript ..
豬腳本:
records = LOAD '/user/cloudera/outputfiles' USING PigStorage(',') AS (word:chararray);
dump records;
豬輸出:
((HOW)
((ARE)
((HI)