1
val final_df = sqlContext.sql("select _xmlns, `md:Date`, `md:Creator`, struct(_ngr, _region, SetofValues) as Station from (select _xmlns, `md:Date`, `md:Creator`, _ngr, _region, struct(_dataType, _period, Value) as SetofValues from (select _xmlns, `md:Date`, `md:Creator`, _ngr, _region, _dataType, _period, struct(_VALUE, _time) as Value from df_h a left outer join df_ds b on a.batchId = b.batchId left outer join df_dsv c on b.batchId = c.batchId left outer join df_nv d on c.batchId = d.batchId))"
final_df.repartition(1).write.format("xml").option("rowTag","NewTag").save(output_path)
架構上面行XML是如下保存數據幀中的火花SQL
root
|-- _xmlns: string (nullable = true)
|-- md:Date: string (nullable = true)
|-- md:Creator: string (nullable = true)
|-- Station: struct (nullable = false)
| |-- _ngr: string (nullable = true)
| |-- _region: string (nullable = true)
| |-- SetofValues: struct (nullable = false)
| | |-- _dataType: string (nullable = true)
| | |-- _period: string (nullable = true)
| | |-- Value: struct (nullable = false)
| | | |-- _VALUE: double (nullable = true)
| | | |-- _time: string (nullable = true)
當我試圖挽救數據幀的使用上面的命令得到XML文件,如下XML。
<ROWS>
<NewTag xmlns="testing">
<md:Date>2016-10-30</md:Date>
<md:Creator>USER_1</md:Creator>
<Station ngr="123456" region="North East">
<SetofValues dataType="Total" period="15 min">
<Value 3.509" time="05:30:00"></Value>
</SetofValues>
</Station>
</NewTag>
<NewTag xmlns="testing">
<md:Date>2016-10-30</md:Date>
<md:Creator>USER_1</md:Creator>
<Station ngr="123456" region="North East">
<SetofValues dataType="Total" period="15 min">
<Value 2.6" time="05:45:00"></Value>
</SetofValues>
</Station>
</NewTag>
<NewTag xmlns="testing">
<md:Date>2016-10-30</md:Date>
<md:Creator>USER_1</md:Creator>
<Station ngr="123456" region="North East">
<SetofValues dataType="Total" period="15 min">
<Value 1.111" time="06:00:00"></Value>
</SetofValues>
</Station>
</NewTag>
</ROWS>
如何實現以下輸出。通過創建陣列來回行..
<NewTag xmlns="testing">
<md:Date>2016-10-30</md:Date>
<md:Creator>USER_1</md:Creator>
<Station ngr="123456" region="North East">
<SetofValues dataType="Total" period="15 min">
<Value time="05:30:00">3.509</Value>
<Value time="05:45:00">2.6</Value>
<Value time="06:00:00">1.111</Value>
</SetofValues>
</Station>
</NewTag>
我不能夠在不同的行轉換成數組列表中XML實現陣列
您的數據是不正確的格式本身。這就是爲什麼它是這樣打印的原因。做一個final_df.show並看看它。正確轉換數據,按照你的想法對它進行分組,然後將其保存。 –
@AbhishekAnand你能幫忙把行轉換成數組嗎? – Naveen