val data = sc.textFile("data.txt")
*// The schema is encoded in a string*
val schemaString = "text code"
*// Import Row.*
import org.apache.spark.sql.Row;
*// Import Spark SQL data types*
import org.apache.spark.sql.types.{StructType,StructField,StringType};
*// Generate the schema based on the string of schema*
val schema = StructType(schemaString.split(" ").map(fieldName => StructField(fieldName, StringType, true)))
*// Convert records of the RDD (data) to Rows.*
val rowRDD = data.map(_.split(",")).map(p => Row(p(0), p(1).trim))
*// Apply the schema to the RDD.*
val dataDataFrame = sqlContext.createDataFrame(rowRDD, schema)
*// Register the DataFrames as a table.*
dataDataFrame.registerTempTable("data")
*// SQL statements can be run by using the sql methods provided by sqlContext.*
val results = sqlContext.sql("SELECT name FROM data")
從所有文件中添加數據不是一個好主意,因爲所有的數據都會被加載到內存中。一次只讀一個文件將是更好的方法。
但是,根據您的使用情況,如果您需要所有文件的數據,則需要以某種方式追加rdds。
希望能回答你的問題! 乾杯! :)
你甚至試圖看文檔? – Niemand