2016-12-16 167 views
1

我正在運行下面的代碼,試圖在Apache Spark中的GraphX中創建圖。VertexRDD給我類型不匹配錯誤

import org.apache.spark.SparkConf 

import org.apache.spark.SparkContext 

import org.apache.spark.graphx.GraphLoader 

import org.apache.spark.graphx.Graph 

import org.apache.spark.rdd.RDD 
import org.apache.spark.graphx.VertexId 

//loads file from the array 

val lines = sc.textFile("hdfs://moonshot-ha-nameservice/data/google-plus/2309.graph"); 

//maps lines and takes the first 21 characters of each line which is the node. 

val result = lines.map(line => line.substring(0,20)) 

//creates a new variable with each node followed by a long . 

val result2 = result.map(word => (word,1L).toLong) 

//where i am getting an error 

val vertexRDD: RDD[(Long,Long)] = sc.parallelize(result2) 

我收到以下錯誤:

error: type mismatch; 

found : org.apache.spark.rdd.RDD[(Long, Long)] 

required: Seq[?] 

Error occurred in an application involving default arguments. 
     val vertexRDD: RDD[(Long, Long)] = sc.parallelize(result2) 

回答

3

首先,您的地圖可以簡化爲以下代碼:現在

val vertexRDD: RDD[(Long, Long)] = 
    lines.map(line => (line.substring(0, 17).toLong, 1L)) 

,你的錯誤:你不能用RDD撥打sc.parallelize。您的vertexRDD已經由result2定義。然後,您可以創建RESULT2您的圖形和你EdgesRDD:

val g = Graph(result2, edgesRDD) 

,或者,如果採用我的建議:

val g = Graph(vertexRDD, edgesRDD) 
+0

當我運行代碼我得到以下幾點:(0 + 9)/ 59] 16/12/16 18:12:26 WARN TaskSetManager:在階段3.0(TID 126,moon07.eecs.qmul.ac.uk)中丟失的任務8.0:java.lang.NumberFormatException:對於輸入字符串:「10867043655226952823」 \t at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) \t at java.lang.Long.parseLong(Long.java:592) \t在java.lang.Long.parseLong(Long.java:631) \t在scala.collection.immutable.StringLike $ class.toLong(StringLike.scala:230) ...... –

+0

@RhysCopperthwaite哦,當然,最大Long值有19個字符,所以你的子字符串應該限制爲18。 GraphX不支持將字符串作爲ID的頂點,因此您必須具有適合Long值的數字ID。如果需要,您也可以嘗試'line.hashCode()'而不是'line.substring()'。 –

+0

@RhysCopperthwaite使用'hashCode()'可能不是定義ID的最佳方式。您需要確保每個節點都有一個可以放入Long變量的獨特數字ID。 –