我覺得spark-csv
會有所幫助,但這裏是純粹的Scala的方法。
當你說「空白空間」時,我假設你的字面意思是那裏有一些空白,並且該行不僅以逗號結尾。
case class Doctor(age:Int, part:String,day:String,value:Double)
val line = "9,elbow,Mon Aug 15 00:00:00 EDT 3399, "
val data = line.split(",").map(_.trim).map {
case "" => "0.0"
case (x:String) => x
}
val doc = Doctor(data(0).toInt, data(1), data(2), data(3).toDouble)
輸出
data: Array[String] = Array(9, elbow, Mon Aug 15 00:00:00 EDT 3399, 0.0)
doc: Doctor(9,elbow,Mon Aug 15 00:00:00 EDT 3399,0.0)
至於星火而言......這使得一個RDD[Doctor]
case class Doctor(age:Int, part:String,day:String,value:Double)
sc.textFile(fileName).map { line =>
val data = line.split(",").map(_.trim).map {
case "" => "0.0"
case (x:String) => x
}
Doctor(data(0).toInt, data(1), data(2), data(3).toDouble)
}
你能做到這一點上RDD? –
當然。類似於'sc.textFile(「file.txt」)。map {line => ...}'? –
這就是我一直在嘗試,但我不能保留所有元素,並刪除空字符串,因爲它是一個雙,我們不能做double.isEmpty() –