2016-12-03 152 views
1
val final_df = sqlContext.sql("select _xmlns, `md:Date`, `md:Creator`, struct(_ngr, _region, SetofValues) as Station from (select _xmlns, `md:Date`, `md:Creator`, _ngr, _region, struct(_dataType, _period, Value) as SetofValues from (select _xmlns, `md:Date`, `md:Creator`, _ngr, _region, _dataType, _period, struct(_VALUE, _time) as Value from df_h a left outer join df_ds b on a.batchId = b.batchId left outer join df_dsv c on b.batchId = c.batchId left outer join df_nv d on c.batchId = d.batchId))" 
final_df.repartition(1).write.format("xml").option("rowTag","NewTag").save(output_path) 

架構上面行XML是如下保存數據幀中的火花SQL

root 
|-- _xmlns: string (nullable = true) 
|-- md:Date: string (nullable = true) 
|-- md:Creator: string (nullable = true) 
|-- Station: struct (nullable = false) 
| |-- _ngr: string (nullable = true) 
| |-- _region: string (nullable = true) 
| |-- SetofValues: struct (nullable = false) 
| | |-- _dataType: string (nullable = true) 
| | |-- _period: string (nullable = true) 
| | |-- Value: struct (nullable = false) 
| | | |-- _VALUE: double (nullable = true) 
| | | |-- _time: string (nullable = true) 

當我試圖挽救數據幀的使用上面的命令得到XML文件,如下XML。

<ROWS> 
<NewTag xmlns="testing"> 
    <md:Date>2016-10-30</md:Date> 
    <md:Creator>USER_1</md:Creator> 
    <Station ngr="123456" region="North East"> 
     <SetofValues dataType="Total" period="15 min"> 
      <Value 3.509" time="05:30:00"></Value> 
     </SetofValues> 
    </Station> 
</NewTag> 
<NewTag xmlns="testing"> 
    <md:Date>2016-10-30</md:Date> 
    <md:Creator>USER_1</md:Creator> 
    <Station ngr="123456" region="North East"> 
     <SetofValues dataType="Total" period="15 min"> 
      <Value 2.6" time="05:45:00"></Value> 
     </SetofValues> 
    </Station> 
</NewTag> 
<NewTag xmlns="testing"> 
    <md:Date>2016-10-30</md:Date> 
    <md:Creator>USER_1</md:Creator> 
    <Station ngr="123456" region="North East"> 
     <SetofValues dataType="Total" period="15 min"> 
      <Value 1.111" time="06:00:00"></Value> 
     </SetofValues> 
    </Station> 
</NewTag> 
</ROWS> 

如何實現以下輸出。通過創建陣列來回行..

<NewTag xmlns="testing"> 
<md:Date>2016-10-30</md:Date> 
<md:Creator>USER_1</md:Creator> 
<Station ngr="123456" region="North East"> 
    <SetofValues dataType="Total" period="15 min"> 
     <Value time="05:30:00">3.509</Value> 
     <Value time="05:45:00">2.6</Value> 
     <Value time="06:00:00">1.111</Value> 
    </SetofValues> 
</Station> 
</NewTag> 

我不能夠在不同的行轉換成數組列表中XML實現陣列

+0

您的數據是不正確的格式本身。這就是爲什麼它是這樣打印的原因。做一個final_df.show並看看它。正確轉換數據,按照你的想法對它進行分組,然後將其保存。 –

+0

@AbhishekAnand你能幫忙把行轉換成數組嗎? – Naveen

回答

0

遲到了,但以防萬一有人懷疑你的架構包含一個價值對於值中的每組每個車站的每一根,像...

Root Station Set Value 
Root Station Set Value 
Root Station Set Value 
Root Station Set Value 

如果你想有一個輸出需要通過按鍵減少,使「價值」的數組。

所以後三個鍵還原您的數據幀會是什麼樣子......

Root Station Set [Value, Value, Value, ...]