1
我正在運行下面的代碼,試圖在Apache Spark中的GraphX中創建圖。VertexRDD給我類型不匹配錯誤
import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
import org.apache.spark.graphx.GraphLoader
import org.apache.spark.graphx.Graph
import org.apache.spark.rdd.RDD
import org.apache.spark.graphx.VertexId
//loads file from the array
val lines = sc.textFile("hdfs://moonshot-ha-nameservice/data/google-plus/2309.graph");
//maps lines and takes the first 21 characters of each line which is the node.
val result = lines.map(line => line.substring(0,20))
//creates a new variable with each node followed by a long .
val result2 = result.map(word => (word,1L).toLong)
//where i am getting an error
val vertexRDD: RDD[(Long,Long)] = sc.parallelize(result2)
我收到以下錯誤:
error: type mismatch;
found : org.apache.spark.rdd.RDD[(Long, Long)]
required: Seq[?]
Error occurred in an application involving default arguments.
val vertexRDD: RDD[(Long, Long)] = sc.parallelize(result2)
當我運行代碼我得到以下幾點:(0 + 9)/ 59] 16/12/16 18:12:26 WARN TaskSetManager:在階段3.0(TID 126,moon07.eecs.qmul.ac.uk)中丟失的任務8.0:java.lang.NumberFormatException:對於輸入字符串:「10867043655226952823」 \t at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) \t at java.lang.Long.parseLong(Long.java:592) \t在java.lang.Long.parseLong(Long.java:631) \t在scala.collection.immutable.StringLike $ class.toLong(StringLike.scala:230) ...... –
@RhysCopperthwaite哦,當然,最大Long值有19個字符,所以你的子字符串應該限制爲18。 GraphX不支持將字符串作爲ID的頂點,因此您必須具有適合Long值的數字ID。如果需要,您也可以嘗試'line.hashCode()'而不是'line.substring()'。 –
@RhysCopperthwaite使用'hashCode()'可能不是定義ID的最佳方式。您需要確保每個節點都有一個可以放入Long變量的獨特數字ID。 –