2011-08-17 46 views
2

我有一個相當大的大型XML文件集,我想從中抽取一些數據。我使用的評估版本爲Altova XMLSpy,其中設法使XPATH正常工作。但是,我需要CSV或文本格式的數據,因此我可以在R或Excel中將其用於進一步評估,並且我無法將XPATH的結果複製到文件中。我發現使用XQUERY我可以,但我無法使XQUERY至少能夠用於一個文件。通過Xquery將數據從XML中提取爲文本

的XML的結構類似於:

<d2LogicalModel xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://datex2.eu/schema/2_0/2_0" modelBaseVersion="2.0" xsi:schemaLocation="http://datex2.eu/schema/2_0/2_0 D:\NDW\CSS\DataGenerator\DATEXIISchema_2_0_2_0.xsd"> 
<payloadPublication xmlns="http://datex2.eu/schema/2_0/2_0" xsi:type="MeasuredDataPublication" lang="nl"> 
    <publicationTime>2011-04-21T05:58:34Z</publicationTime> 
    <publicationCreator> 
     <country>nl</country> 
     <nationalIdentifier>NDW-CNS</nationalIdentifier> 
    </publicationCreator> 
    <measurementSiteTableReference>NDW01_MT_321</measurementSiteTableReference> 
    <headerInformation> 
     <areaOfInterest>national</areaOfInterest> 
     <confidentiality>restrictedToAuthorities</confidentiality> 
     <informationStatus>real</informationStatus> 
    </headerInformation> 
    <siteMeasurements> 
     <measurementSiteReference>GRT01_MORO_1002_2</measurementSiteReference> 
     <measurementTimeDefault>2011-04-21T05:57:00Z</measurementTimeDefault> 
     <measuredValue index="1"> 
      <basicDataValue xsi:type="TrafficSpeed"/> 
     </measuredValue> 
     <measuredValue index="2"> 
      <basicDataValue xsi:type="TrafficSpeed"/> 
     </measuredValue> 
     <measuredValue index="3"> 
      <basicDataValue xsi:type="TrafficSpeed"/> 
     </measuredValue> 
     <measuredValue index="4"> 
      <basicDataValue xsi:type="TrafficSpeed"/> 
     </measuredValue> 
     <measuredValue index="5"> 
      <basicDataValue xsi:type="TrafficSpeed"/> 
     </measuredValue> 
     <measuredValue index="6"> 
      <basicDataValue xsi:type="TrafficSpeed"/> 
     </measuredValue> 
    </siteMeasurements> 
    <siteMeasurements> 
     <measurementSiteReference>RWS01_MONIBAS_0021hrr2131ra</measurementSiteReference> 
     <measurementTimeDefault>2011-04-21T05:57:00Z</measurementTimeDefault> 
     <measuredValue index="1"> 
      <basicDataValue xsi:type="TrafficFlow"> 
       <time>2011-04-21T05:56:00Z</time> 
       <vehicleFlow>900</vehicleFlow> 
      </basicDataValue> 
     </measuredValue> 
     <measuredValue index="2"> 
      <basicDataValue xsi:type="TrafficSpeed"> 
       <numberOfInputValuesUsed>60</numberOfInputValuesUsed> 
       <standardDeviation>0</standardDeviation> 
       <time>2011-04-21T05:56:00Z</time> 
       <averageVehicleSpeed>115</averageVehicleSpeed> 
      </basicDataValue> 
     </measuredValue> 
     <measuredValue index="3"> 
      <basicDataValue xsi:type="TrafficFlow"> 
       <time>2011-04-21T05:56:00Z</time> 
       <vehicleFlow>1020</vehicleFlow> 
      </basicDataValue> 
     </measuredValue> 
     <measuredValue index="4"> 
      <basicDataValue xsi:type="TrafficSpeed"> 
       <numberOfInputValuesUsed>60</numberOfInputValuesUsed> 
       <standardDeviation>0</standardDeviation> 
       <time>2011-04-21T05:56:00Z</time> 
       <averageVehicleSpeed>104</averageVehicleSpeed> 
      </basicDataValue> 
     </measuredValue> 
    </siteMeasurements> 

我要過濾的measurementSiteReference一個特定的值,並用basicDataValueTrafficFlow得到所有measuredValue的結果,最好的格式:

index, value, timestamp 
1, 900, 05:56:00 
3, 1020, 05:56:00 

我有以下XPATH:

//text()[contains(.,"GEO01_Z_RWSTI1011")]/parent::*/parent::*/descendant::measuredValue[(@index)]/basicDataValue/vehicleFlow 

這給了我一個文件的結果,但我找不到將XPATH轉換爲XQUERY的方法。當前的XQUERY不會返回任何結果:

let $nl := "&#10;" 
for $x in doc("TrafficSpeed 20110421 0800-1559\0800_trafficspeed")/d2LogicalModel/payloadPublication/siteMeasurements 
where $x/measurementSiteReference/text()[contains(.,"GEO01_Z_RWSTI1011")] 
return concat($x/measurementSiteReference/measuredValue,$nl) 

如何使用XQUERY獲取我想要的回報?

回答

0

您的元素綁定到名稱空間xmlns="http://datex2.eu/schema/2_0/2_0",但您不是在XPATH語句中限定元素的命名空間。所以,你的XPATH語句不會選擇你想要的元素。

你會想要做這樣的事情來聲明命名空間,在你的XPath語句中使用它:

declare namespace datex = "http://datex2.eu/schema/2_0/2_0"; 

let $nl := "&#10;" 

for $x in doc("TrafficSpeed 20110421 0800-1559\0800_trafficspeed")/datex:d2LogicalModel/datex:payloadPublication/datex:siteMeasurements 
where $x/datex:measurementSiteReference/text()[contains(.,"GEO01_Z_RWSTI1011")] 
return concat($x/datex:measurementSiteReference/datex:measuredValue,$nl) 

但是,您將很可能碰上一個序列使用concat()一個問題,那當前的代碼不會產生您想要的輸出。

+0

謝謝!命名空間提示確實有幫助,因爲我現在得到了結果(當失去concat())時。我仍然需要分開結果。 – ThijsMuis

0

我設法得到答案,雖然不完整,因爲我想:

declare namespace datex = "http://datex2.eu/schema/2_0/2_0"; 
declare variable $sep := ','; 
declare variable $eol := '&#10;'; 

for $x in collection("0900_trafficspeed")/datex:d2LogicalModel/datex:payloadPublication/datex:siteMeasurements 
let $site := $x/datex:measurementSiteReference/text() 
let $time := $x/datex:measurementTimeDefault/text() 
let $index := $x/datex:measurementSiteReference/parent::*/descendant::datex:measuredValue/@index 
let $flow := $x/datex:measurementSiteReference/parent::*/descendant::datex:measuredValue/datex:basicDataValue/datex:vehicleFlow/text() 
where $x/datex:measurementSiteReference/text()[contains(.,"GEO01_Z_RWSTI1011")] 
return string(concat(string-join(($site,$time,$flow),$sep),$eol))