優化慢速查詢xquery

現在，查詢需要大約2分鐘，然後我做了一些更改，花費了3：48m。優化慢速查詢xquery

這些xml文檔是從網頁中獲取的，因爲它每改變5m並實時提供關於總線的信息。

你能幫我優化這個查詢嗎？

xquery version "3.0"; 
declare namespace bus="http://docs.gijon.es/sw/busgijon.asmx"; 

declare function local:getNombreParada($numero) 
{ 
    for $parada in doc("http://datos.gijon.es/doc/transporte/busgijoninfo.xml")//paradas/bus:parada 
    where $numero=$parada/bus:idparada 
    return $parada/bus:descripcion 
}; 

declare function local:getBusesPorLinea($linea) 
{ 

    let $numero:=$linea 
    let $nBuses:=count(doc("http://datos.gijon.es/doc/transporte/busgijontr.xml")//bus:llegada[bus:idlinea=$numero]) 

    return 
    if($nBuses=0) 
    then(<p>No hay ningun bus en esta linea</p>) 
    else(
    <div> 
     <h2>Numero de buses funcionando en la linea {$numero} : {$nBuses}</h2> 

    <table class="table table-hover"> 
     <thead> 
      <tr> 
      <th>Parada</th> 
      <th>Minutos hasta la llegada</th> 
      </tr> 
     </thead> 
     <tbody> 
      { 
      for $l in doc("http://datos.gijon.es/doc/transporte/busgijontr.xml")//bus:llegada[bus:idlinea=$numero] 
       for $parada in doc("http://datos.gijon.es/doc/transporte/busgijoninfo.xml")//paradas/bus:parada[bus:idparada=$l/bus:idparada] 


      return <tr> 
         <td>{$parada/bus:descripcion}</td> 
         <td>{$l/bus:minutos}</td></tr> 
      } 
     </tbody> 
    </table> 

    </div> 
    ) 


}; 

local:getBusesPorLinea(1)

PD：我運行這存在Db的

來源

2016-01-04 Roberto Fernandez

只是一個側面說明，邁克爾·凱說還是最相關：儘量避免''//如果遇到性能問題。這將始終觸發對所有後代的完整掃描，並且具體查找（即完整路徑）肯定會更快。 – dirkk

@dirkk其實這對於eXist來說並不正確。相反，反過來是正確的，即由於索引的工作方式，「//」比使用完整路徑要快得多;這是假設當然你已經創建了索引第一;-) – adamretter

@adamretter有趣的是，我不知道這一點。我也很驚訝，因爲具體路徑總是比簡單的後代或自我操作符更多的信息，也就是說，優化器可以（理論上）重寫「//」的具體路徑，反過來這是不可能的。但是，將來對於一般的XQuery處理器我會避免這種說法，看起來這隻適用於BaseX。 – dirkk

是否緩存文件？我不是專家，但您的代碼似乎多次訪問同一文檔。如果您確定內容被緩存在執行環境中，那也沒關係。否則，我會嘗試聲明

declare variable $docinfo := doc("http://datos.gijon.es/doc/transporte/busgijoninfo.xml"); 
declare variable $doctr := doc("http://datos.gijon.es/doc/transporte/busgijontr.xml");

以確保文件只讀取一次。

對於相同類型的數據，您還要至少掃描兩次文檔。我會做，一旦：

declare variable $paradas := $docinfo//paradas; 
declare variable $llegadas := $doctr//bus:llegada;

則僅篩選集合：

declare function local:getNombreParada($numero) 
{ 
    $paradas/bus:parada[bus:idparada = $numero]/bus:descripcion 
}; 

declare function local:getBusesPorLinea($linea) 
{ 
    let $numero:=$linea 
    let $llegadasNum:=$llegadas[bus:idlinea=$numero] 
    let $nBuses:=count($llegadasNum) 

    return 

    if($nBuses=0) 
    then(<p>No hay ningun bus en esta linea</p>) 
    else(
    <div> 
     <h2>Numero de buses funcionando en la linea {$numero} : {$nBuses}</h2> 

    <table class="table table-hover"> 
     <thead> 
      <tr> 
      <th>Parada</th> 
      <th>Minutos hasta la llegada</th> 
      </tr> 
     </thead> 
     <tbody> 
      { 
      for $l in $llegadasNum 
       for $parada in $paradas/bus:parada[bus:idparada=$l/bus:idparada] 
       return <tr> 
         <td>{$parada/bus:descripcion}</td> 
         <td>{$l/bus:minutos}</td></tr> 
      } 
     </tbody> 
    </table> 

    </div> 
    ) 
};

可以是快不了多少，但我希望它是更可讀一點。

來源

2016-01-04 21:44:56 CiaPan

沒有智能優化，這個連接式：

for $l in doc("a.xml")//bus:llegada[bus:idlinea=$numero] 
    for $parada in doc("b.xml")//paradas/bus:parada[bus:idparada=$l/bus:idparada] 
return <tr>...</tr>

都將有二次性能。你沒有告訴我們關於文件大小的任何事情，但那是我開始尋找的地方。

您在XML數據庫環境中處理此類問題的方式通常是通過創建適當的索引。

來源

2016-01-05 00:01:59

首先，優化eXist中查詢的最佳方法是在本地和索引中存儲XML。請使用內置的文檔來設置索引。

但是，您的代碼會從網絡中反覆提取相同的數據。讓我們來關注這個以及另一個問題，即使用內存中的XML查詢，這是另一個優化瓶頸。

最重要的第一步是讓您在本地數據庫中查詢XML。與針對內存中XML節點的查詢相比，數據庫中的節點查詢速度更快，使用內存更少。（至少，那是2.2版本的情況）

因此，這裏是一種在本地緩存數據的方法，在最新更新超過5分鐘後刷新緩存。

xquery version "3.0"; 

declare namespace bus="http://docs.gijon.es/sw/busgijon.asmx"; 

(: Store the XML data in the collection /db/busgijon/data :) 
declare variable $COL := "/db/busgijon/data"; 
declare variable $INFO-FILE := "busgijoninfo.xml"; 
declare variable $TR-FILE := "busgijontr.xml"; 

(: Fetch a page from cache or from web site, updating the cache :) 
declare function local:fetchPage($filename) { 
    (: If the page was fetched more than 5 minutes ago, refresh it :) 
    let $expire := current-dateTime() - xs:dayTimeDuration('PT5M') 
    let $page := doc($COL || "/" || $filename)/page 
    return if (exists($page)) 
     then if ($page/xs:dateTime(@timestamp) ge $expire) 
      then $page 
      else (update replace $page/* with doc("http://datos.gijon.es/doc/transporte/" || $filename)/* 
       , update value $page/@timestamp with current-dateTime() 
       , $page) 
     else doc(xmldb:store($COL, $filename, <page timestamp="{current-dateTime()}">{doc("http://datos.gijon.es/doc/transporte/" || $filename)/*}</page>))/page 
}; 

declare function local:getBusesPorLinea($linea) 
{ 
    (: Get the two pages from the database cache for querying :) 
    let $info := local:fetchPage($INFO-FILE)/bus:BusGijonInfo 
    let $tr := local:fetchPage($TR-FILE)/bus:BusGijonTr 

    let $numero:=$linea 
    let $nBuses:=count($tr//bus:llegada[bus:idlinea=$numero]) 

    return 
    if($nBuses=0) 
    then(<p>No hay ningun bus en esta linea</p>) 
    else(
    <div> 
     <h2>Numero de buses funcionando en la linea {$numero} : {$nBuses}</h2> 

    <table class="table table-hover"> 
     <thead> 
      <tr> 
      <th>Parada</th> 
      <th>Minutos hasta la llegada</th> 
      </tr> 
     </thead> 
     <tbody> 
      { 
      (: Loop through the TR page - fetched just once from cache :) 
      for $l in $tr//bus:llegada[bus:idlinea=$numero] 
       (: Loop through the Info page - fetched just once from cache :) 
       for $parada in $info//paradas/bus:parada[bus:idparada=$l/bus:idparada] 


      return <tr> 
         <td>{$parada/bus:descripcion}</td> 
         <td>{$l/bus:minutos}</td></tr> 
      } 
     </tbody> 
    </table> 

    </div> 
    ) 


}; 

local:getBusesPorLinea(1)

，我在當地的唯一改變的部分：getBusesPorLinea功能在從緩存中讀取上面的兩個文件，並利用這些嵌入式循環。

local：fetchPage函數是發生大部分加速的地方。以下是它的作用：

將過期時間設置爲5分鐘。
嘗試從緩存中獲取指定的頁面。
如果該頁面存在，請將獲取的時間戳與到期時間戳進行比較。
如果頁面的時間戳小於5分鐘前（大於過期時間戳），則返回該頁面。
如果頁面的時間戳大於5分鐘前，請重新獲取它，使用刷新的文檔更新頁面內容，更新頁面的時間戳並返回新頁面。
如果頁面尚不存在，請使用當前時間戳將頁面保存到指定集合，並返回頁面元素。

5分鐘過後，第一個訪問此XQuery的人在緩存刷新後會有大約5-10秒的時間。這使得緩存成爲被動的，所以你不必每五分鐘手動刷新一次。

希望這會有所幫助。

來源

2016-01-05 13:04:05 westbaystars

謝謝我將此添加到CiaPan建議的其他解決方案，現在需要2秒（使用CiaPan建議需要4s，之前需要2分鐘，我不需要真的知道爲什麼差異如此之大） –

如果在eXist-db中添加類型索引並調整謂詞，則應該能夠將此查詢降低到10 ms。 – adamretter

@RobertoFernandez在'for $ l'中，您可以請求加載'busgijoninfo.xml'文件以掃描'bus：parada'元素。如果文檔沒有被緩存，那麼你可以多次獲取文件，因爲'busgijontr.xml'文件中的'bus：llegada'元素滿足'[bus：idlinea = $ numero]'謂詞條件。這意味着'$ numero = 4'大約30個文件加載，'$ numero = 1'大約60個，'$ numero = 20'大約65個。這可能是爲什麼一次獲取會將運行時間縮短到約。先前值的1/30。 – CiaPan

另一個提示：對於eXist-db中的查詢，最好避免使用where子句。 XPath謂詞通常執行得更好。

不少技巧上http://exist-db.org/exist/apps/doc/tuning.xml?q=performance&field=all&id=D2.2.2#D2.2.6

來源

2016-04-04 21:01:11 DiZzZz

優化慢速查詢xquery

回答

相關問題