2017-10-17 58 views
0

鑑於火花以下數據幀:星火:進口數據幀的MongoDB(斯卡拉)

Name,LicenseID_1,TypeCode_1,State_1,LicenseID_2,TypeCode_2,State_2,LicenseID_3,TypeCode_3,State_3  
"John","123ABC",1,"WA","456DEF",2,"FL","789GHI",3,"CA" 
"Jane","ABC123",5,"AZ","DEF456",7,"CO","GHI789",8,"GA" 

我怎麼能使用Scala的火花寫進去的MongoDB作爲文檔收集此如下:

{ "Name" : "John", 
    "Licenses" : 
    { 
    [ 
     {"LicenseID":"123ABC","TypeCode":"1","State":"WA" }, 
     {"LicenseID":"456DEF","TypeCode":"2","State":"FL" }, 
     {"LicenseID":"789GHI","TypeCode":"3","State":"CA" } 
    ] 
    } 
}, 

{ "Name" : "Jane", 
    "Licenses" : 
    { 
    [ 
     {"LicenseID":"ABC123","TypeCode":"5","State":"AZ" }, 
     {"LicenseID":"DEF456","TypeCode":"7","State":"CO" }, 
     {"LicenseID":"GHI789","TypeCode":"8","State":"GA" } 
    ] 
    } 
} 

我試圖做到這一點,但在下面的代碼得到塊:

val customSchema = StructType(Array(StructField("Name", StringType, true), StructField("LicenseID_1", StringType, true), StructField("TypeCode_1", StringType, true), StructField("State_1", StringType, true), StructField("LicenseID_2", StringType, true), StructField("TypeCode_2", StringType, true), StructField("State_2", StringType, true), StructField("LicenseID_3", StringType, true), StructField("TypeCode_3", StringType, true), StructField("State_3", StringType, true))) 
val license = sqlContext.read.format("com.databricks.spark.csv").option("header", "true").schema(customSchema).load("D:\\test\\test.csv") 
case class License(LicenseID:String, TypeCode:String, State:String) 
case class Data(Name:String, Licenses: Array[License]) 
val transformedData = license.map(data => Data(data(0),Array(License(data(1),data(2),data(3)),License(data(4),data(5),data(6)),License(data(7),data(8),data(9))))) 

<console>:46: error: type mismatch; 
found : Any 
required: String 
     val transformedData = license.map(data => Data(data(0),Array(License(data(1),data(2),data(3)),License(data(4),data(5),data(6)),License(data(7),data(8),data(9))))) 
... 
+1

請更準確地說明您的問題。也許發佈一些你已經試過的代碼。 –

+0

正如你所看到的,因爲類似的信息(三臺單獨的許可證信息跨越多個列),我想導入到MongoDB的與「許可證」的文件作爲屬性名稱和值的數組的多列包含每個許可證信息的名稱值對的許可證。 – SYL

+0

你有沒有試圖編寫任何代碼來做到這一點?如果是這樣,請發佈並指出問題所在。如果沒有,請嘗試。 –

回答

0

不知道什麼是你R第,添加例如如何保存數據的火花和芒果

sparkSession.loadFromMongoDB() // Uses the SparkConf for configuration 
sparkSession.loadFromMongoDB(ReadConfig(Map("uri" -> "mongodb://example.com/database.collection"))) // Uses the ReadConfig 

sparkSession.read.mongo() 
sparkSession.read.format("com.mongodb.spark.sql").load() 

// Set custom options: 
sparkSession.read.mongo(customReadConfig) 
sparkSession.read.format("com.mongodb.spark.sql").options. 
(customReadConfig.asOptions).load() 

所述的連接器提供了持久化數據轉換成的MongoDB的能力。

MongoSpark.save(centenarians.write.option("collection", "hundredClub")) 
    MongoSpark.load[Character](sparkSession, ReadConfig(Map("collection" -> 

「數據」),有些(ReadConfig(sparkSession))))。節目()

替代保存數據

dataFrameWriter.write.mongo() 
dataFrameWriter.write.format("com.mongodb.spark.sql").save() 
+0

讀取和寫入mongo不是我的問題。問題是如何將數據重組(對於類似的數據,例如,從列的關鍵值對數組),然後將其保存到mongodb的,使其顯示如在樣品JSON。 – SYL

+0

https://stackoverflow.com/questions/39389700/spark-dataframe-is-saved-to-mongodb-in-wrong-format –

0

添加的ToString固定的問題,我能以我想要的格式保存到mongodb。 (數據=>數據(0).toString,數組(數據(1).toString,數據(2).toString,數據(3).toString),許可證(數據(4)的ToString,數據(5)的ToString,數據(6)的ToString),許可(數據(7)的ToString,數據(8)的ToString,數據(9)的ToString))))