2015-10-22 45 views
0

如何解析包含Apache Spark中相同節點列表的xml文件?文件的如何解析Apache Spark中的xml文件?

例子:

<?xml version="1.0" encoding="UTF-8"?> 
<osm version="0.6" generator="CGImap 0.4.0 (25361 thorn-02.openstreetmap.org)" copyright="OpenStreetMap and contributors" attribution="http://www.openstreetmap.org/copyright" license="http://opendatacommons.org/licenses/odbl/1-0/"> 
<bounds minlat="48.8306100" minlon="2.3310900" maxlat="48.8337900" maxlon="2.3389100"/> 
<node id="430785" visible="true" version="8" changeset="24482318" timestamp="2014-08-01T14:24:53Z" user="dhuyp" uid="1779584" lat="48.8340725" lon="2.3309196"/> 
<node id="661209" visible="true" version="6" changeset="9914127" timestamp="2011-11-22T21:46:44Z" user="lapinos03" uid="33634" lat="48.8337517" lon="2.3333992"/> 
<node id="24912996" visible="true" version="2" changeset="806076" timestamp="2009-03-14T10:38:25Z" user="Goon" uid="24657" lat="48.8302268" lon="2.3338015"> 
    <tag k="crossing" v="uncontrolled"/> 
    <tag k="highway" v="traffic_signals"/> 
</node> 
<node id="24912994" visible="true" version="5" changeset="5904801" timestamp="2010-09-28T15:32:01Z" user="maouth-" uid="322872" lat="48.8301333" lon="2.3309869"> 
    <tag k="highway" v="mini_roundabout"/> 
</node> 
</osm> 
+1

【如何閱讀Apache的火花框架XML文件?(HTTP的可能重複:// stackoverflow.com/questions/20225129/how-to-read-xml-files-from-apache-spark-framework) –

+1

[Spark中的Xml處理]的可能重複(http://stackoverflow.com/questions/33078221/xml - 處理 - 在火花) – zero323

回答

1

作爲另一個答覆中提到,火花XML從Databricks是讀取XML的一種方式,但是there is currently a bug in spark-xml防止您導入自閉元素。爲了解決這個問題,你可以導入整個XML作爲一個單一的值,然後像做了以下內容:

val pathToYourData = "Z:/test.xml" 
val osm = sqlContext.read.format("com.databricks.spark.xml").option("rowTag", "osm").load(pathToYourData) 
val nodes = osm.selectExpr("explode(node) as node") 
nodes.select("node.*").show 
/* 
+------+----------+--------+----------+---------+--------------------+-------+---------+--------+--------+--------------------+ 
|#VALUE|@changeset|  @id|  @lat|  @lon|   @timestamp| @uid| @user|@version|@visible|     tag| 
+------+----------+--------+----------+---------+--------------------+-------+---------+--------+--------+--------------------+ 
| null| 24482318| 430785|48.8340725|2.3309196|2014-08-01T14:24:53Z|1779584| dhuyp|  8| true|    null| 
| null| 9914127| 661209|48.8337517|2.3333992|2011-11-22T21:46:44Z| 33634|lapinos03|  6| true|    null| 
| null| 806076|24912996|48.8302268|2.3338015|2009-03-14T10:38:25Z| 24657|  Goon|  2| true|[[null,crossing,u...| 
| null| 5904801|24912994|48.8301333|2.3309869|2010-09-28T15:32:01Z| 322872| maouth-|  5| true|[[null,highway,mi...| 
+------+----------+--------+----------+---------+--------------------+-------+---------+--------+--------+--------------------+ 
*/