2017-05-06 48 views
0

問題:

我已經在RDD形式Array[Array[String]],我需要內陣列中串的組合。但是,當我申請地圖功能我收到以下錯誤

java.io.NotSerializableException: scala.collection.TraversableOnce$FlattenOps$$anon$1 
Serialization stack: 
    - object not serializable (class: scala.collection.TraversableOnce$FlattenOps$$anon$1, value: non-empty iterator) 
    - element of array (index: 0) 
    - array (class [Lscala.collection.Iterator;, size 10) 
    at org.apache.spark.serializer.SerializationDebugger$.improveException(SerializationDebugger.scala:40) 
    at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:46) 
    at org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:100) 
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:324) 
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
    at java.lang.Thread.run(Thread.java:745) 

背景:

起初我有以下:

Array[org.apache.spark.sql.Row] = Array([cyber crimes ;; cyber security ;; review ;; india ;; instances ;; state ;; issue], [civil rights ;; case ;; instances ;; frequency]) 

當我用下面的代碼清洗這樣的:

words.map(r => r(0).asInstanceOf[String].split("\\;;").map(_.trim)) 

其結果如下:

Array[Array[String]] = Array(Array(cyber crimes, cyber security, review, india, instances, state, issue), Array(civil society, instances, frequency)) 

現在我需要像字符串數組的所有可能的組合:

Array[Array[String]] = Array(Array((cyber crimes, cyber security), (review, india), (instances, state), (issue,cyber crimes))....etc) 

當我向其施加map它給我上面的錯誤:

val combinations = cleanwords.map(r => r(0).asInstanceOf[String].combinations(2)) 

誰能幫助我得到這個期望的結果?

回答

1

發生此錯誤可能是因爲嘗試收集元素爲迭代器(由combinations生成)的rdd。此外,你需要combinations直接在陣列上:

cleanwords.map(_.combinations(2).toArray).collect 
// res47: Array[Array[Array[String]]] = Array(Array(Array(cyber crimes, cyber security), Array(cyber crimes, review), Array(cyber crimes, india) .. 

要返回的元組:

cleanwords.map(_.combinations(2).map(x => (x(0), x(1))).toArray).collect 
// res60: Array[Array[(String, String)]] = Array(Array((cyber crimes,cyber security), (cyber crimes,review), (cyber crimes,india) ..