2016-12-24 87 views
0

我想寫一個火花轉換代碼將下面的數據轉換爲以下類的對象列表,我對scala和spark完全陌生並試圖分割數據並將它們放入案例班,但我無法將其追回。請求你的幫助。將數據轉換爲火花scala中的類對象列表

數據:

FirstName,LastName,Country,match,Goals 
Cristiano,Ronaldo,Portugal,Match1,1 
Cristiano,Ronaldo,Portugal,Match2,1 
Cristiano,Ronaldo,Portugal,Match3,0 
Cristiano,Ronaldo,Portugal,Match4,2 
Lionel,Messi,Argentina,Match1,1 
Lionel,Messi,Argentina,Match2,2 
Lionel,Messi,Argentina,Match3,1 
Lionel,Messi,Argentina,Match4,2 

所需的輸出:

PLayerStats{ String FirstName, 
    String LastName, 
    String Country, 
    Map <String,Int> matchandscore 
} 

回答

0

第一線轉換成鍵值對說(Cristiano, rest of data)然後應用groupByKeyreduceByKey也能正常工作,然後嘗試將鍵值對數據轉換通過放置值將groupByKey或reduceByKey應用到您的類中之後。藉助着名的單詞計數程序。

http://spark.apache.org/examples.html

0

你可以嘗試一些如下:

val file = sc.textFile("myfile.csv") 

val df = file.map(line => line.split(",")).  // split line by comma 
       filter(lineSplit => lineSplit(0) != "FirstName"). // filter out first row 
       map(lineSplit => {   // transform lines 
       (lineSplit(0), lineSplit(1), lineSplit(2), Map((lineSplit(3), lineSplit(4).toInt)))}). 
       toDF("FirstName", "LastName", "Country", "MatchAndScore")   

df.schema 
// res34: org.apache.spark.sql.types.StructType = StructType(StructField(FirstName,StringType,true), StructField(LastName,StringType,true), StructField(Country,StringType,true), StructField(MatchAndScore,MapType(StringType,IntegerType,false),true)) 

df.show 

+---------+--------+---------+----------------+ 
|FirstName|LastName| Country| MatchAndScore| 
+---------+--------+---------+----------------+ 
|Cristiano| Ronaldo| Portugal|Map(Match1 -> 1)| 
|Cristiano| Ronaldo| Portugal|Map(Match2 -> 1)| 
|Cristiano| Ronaldo| Portugal|Map(Match3 -> 0)| 
|Cristiano| Ronaldo| Portugal|Map(Match4 -> 2)| 
| Lionel| Messi|Argentina|Map(Match1 -> 1)| 
| Lionel| Messi|Argentina|Map(Match2 -> 2)| 
| Lionel| Messi|Argentina|Map(Match3 -> 1)| 
| Lionel| Messi|Argentina|Map(Match4 -> 2)| 
+---------+--------+---------+----------------+ 
1

假設你已經加載數據到一個名爲data一個RDD[String]

case class PlayerStats(FirstName: String, LastName: String, Country: String, matchandscore: Map[String, Int]) 

val result: RDD[PlayerStats] = data 
    .filter(!_.startsWith("FirstName")) // remove header 
    .map(_.split(",")).map { // map into case classes 
    case Array(fn, ln, cntry, mn, g) => PlayerStats(fn, ln, cntry, Map(mn -> g.toInt)) 
    } 
    .keyBy(p => (p.FirstName, p.LastName)) // key by player 
    .reduceByKey((p1, p2) => p1.copy(matchandscore = p1.matchandscore ++ p2.matchandscore)) 
    .map(_._2) // remove key 
+0

謝謝!! ti工作 – Bhushan

+0

@Bhushan很高興幫助 - 您可以接受/提出答案,讓未來的讀者知道這是有用的 –