2012-07-18 21 views
10

搜索XML我有以下示例XML:Clojure中

<data> 
    <products> 
    <product> 
     <section>Red Section</section> 
     <images> 
     <image>img.jpg</image> 
     <image>img2.jpg</image> 
     </images> 
    </product> 
    <product> 
     <section>Blue Section</section> 
     <images> 
     <image>img.jpg</image> 
     <image>img3.jpg</image> 
     </images> 
    </product> 
    <product> 
     <section>Green Section</section> 
     <images> 
     <image>img.jpg</image> 
     <image>img2.jpg</image> 
     </images> 
    </product> 
    </products> 
</data> 

我知道如何分析它的Clojure

(require '[clojure.xml :as xml]) 
(def x (xml/parse 'location/of/that/xml')) 

這將返回描述XML

{:tag :data, 
:attrs nil, 
:content [ 
    {:tag :products, 
     :attrs nil, 
     :content [ 
      {:tag :product, 
      :attrs nil, 
      :content [] .. 
嵌套地圖

這個結構當然可以用標準的Clojure函數遍歷,但它可能會變得非常冗長,特別是如果比較t例如,用XPath查詢它。是否有任何幫手來遍歷和搜索這樣的結構?我怎樣才能,例如

  • 得到所有<product>
  • 列表只得到文本「img2.jpg」
  • 得到其section是產品的「紅色款,其<images>標籤包含一個<image>產品「

感謝

回答

9

運用data.zipZippers這裏是你的第二個用例的解決方案:在許多情況下

(ns core 
    (:use clojure.data.zip.xml) 
    (:require [clojure.zip :as zip] 
      [clojure.xml :as xml])) 

(def data (zip/xml-zip (xml/parse PATH))) 
(def products (xml-> data :products :product)) 

(for [product products :let [image (xml-> product :images :image)] 
         :when (some (text= "img2.jpg") image)] 
    {:section (xml1-> product :section text) 
    :images (map text image)}) 
=> ({:section "Red Section", :images ("img.jpg" "img2.jpg")} 
    {:section "Green Section", :images ("img.jpg" "img2.jpg")}) 
0

隨着clojures地圖和矢量語義對於訪問XML的語法足夠的線程第一個宏。在許多情況下,您希望更特定於xml的某些內容(如xpath庫),但在許多情況下,現有語言幾乎與添加任何依賴關係一樣簡潔。

(pprint (-> (xml/parse "/tmp/xml") 
     :content first :content second :content first :content first)) 
"Blue Section" 
3

下面是使用data.zip的替代版本,用於所有三個用例。我發現xml->xml1->具有非常強大的內置導航功能,向量中具有子查詢。

;; [org.clojure/data.zip "0.1.1"] 

(ns example.core 
    (:require 
    [clojure.zip :as zip] 
    [clojure.xml :as xml] 
    [clojure.data.zip.xml :refer [text xml-> xml1->]])) 

(def data (zip/xml-zip (xml/parse "/tmp/products.xml"))) 

(let [all-products (xml-> data :products :product) 
     red-section (xml1-> data :products :product [:section "Red Section"]) 
     img2 (xml-> data :products :product [:images [:image "img2.jpg"]])] 
    {:all-products (map (fn [product] (xml1-> product :section text)) all-products) 
    :red-section (xml1-> red-section :section text) 
    :img2 (map (fn [product] (xml1-> product :section text)) img2)}) 

=> {:all-products ("Red Section" "Blue Section" "Green Section"), 
    :red-section "Red Section", 
    :img2 ("Red Section" "Green Section")} 
+0

+1我知道你以後回答,但你有所有3個問題的唯一答案,你很好地分離導航和報告結果 – 2017-02-03 14:45:28

1

The Tupelo library可以很容易地解決類似這樣的使用tupelo.forest樹狀數據結構的問題。請see this question for more information。 API文檔can be found here

在這裏,我們加載你的xml數據,並將其首先轉化爲有活力,然後使用tupelo.forest使用的本地樹結構。利布斯&數據DEF:

(ns tst.tupelo.forest-examples 
    (:use tupelo.forest tupelo.test) 
    (:require 
    [clojure.data.xml :as dx] 
    [clojure.java.io :as io] 
    [clojure.set :as cs] 
    [net.cgrand.enlive-html :as en-html] 
    [schema.core :as s] 
    [tupelo.core :as t] 
    [tupelo.string :as ts])) 
(t/refer-tupelo) 

(def xml-str-prod "<data> 
        <products> 
         <product> 
         <section>Red Section</section> 
         <images> 
          <image>img.jpg</image> 
          <image>img2.jpg</image> 
         </images> 
         </product> 
         <product> 
         <section>Blue Section</section> 
         <images> 
          <image>img.jpg</image> 
          <image>img3.jpg</image> 
         </images> 
         </product> 
         <product> 
         <section>Green Section</section> 
         <images> 
          <image>img.jpg</image> 
          <image>img2.jpg</image> 
         </images> 
         </product> 
        </products> 
        </data> ") 

和初始化代碼:

(dotest 
    (with-forest (new-forest) 
    (let [enlive-tree   (->> xml-str-prod 
           java.io.StringReader. 
           en-html/html-resource 
           first) 
      root-hid    (add-tree-enlive enlive-tree) 
      tree-1    (hid->hiccup root-hid) 

在HID後綴代表「十六進制ID」,它是作用就像一個指向節點/葉在樹中唯一的十六進制值。在這個階段,我們剛剛加載在林中的數據結構中的數據,創建樹-1,它看起來像:

[:data 
[:tupelo.forest/raw "\n     "] 
[:products 
    [:tupelo.forest/raw "\n      "] 
    [:product 
    [:tupelo.forest/raw "\n      "] 
    [:section "Red Section"] 
    [:tupelo.forest/raw "\n      "] 
    [:images 
    [:tupelo.forest/raw "\n       "] 
    [:image "img.jpg"] 
    [:tupelo.forest/raw "\n       "] 
    [:image "img2.jpg"] 
    [:tupelo.forest/raw "\n      "]] 
    [:tupelo.forest/raw "\n      "]] 
    [:tupelo.forest/raw "\n      "] 
    [:product 
    [:tupelo.forest/raw "\n      "] 
    [:section "Blue Section"] 
    [:tupelo.forest/raw "\n      "] 
    [:images 
    [:tupelo.forest/raw "\n       "] 
    [:image "img.jpg"] 
    [:tupelo.forest/raw "\n       "] 
    [:image "img3.jpg"] 
    [:tupelo.forest/raw "\n      "]] 
    [:tupelo.forest/raw "\n      "]] 
    [:tupelo.forest/raw "\n      "] 
    [:product 
    [:tupelo.forest/raw "\n      "] 
    [:section "Green Section"] 
    [:tupelo.forest/raw "\n      "] 
    [:images 
    [:tupelo.forest/raw "\n       "] 
    [:image "img.jpg"] 
    [:tupelo.forest/raw "\n       "] 
    [:image "img2.jpg"] 
    [:tupelo.forest/raw "\n      "]] 
    [:tupelo.forest/raw "\n      "]] 
    [:tupelo.forest/raw "\n     "]] 
[:tupelo.forest/raw "\n     "]] 

接下來,我們刪除所有空白字符串與此代碼:

blank-leaf-hid?  (fn [hid] (and (leaf-hid? hid) ; ensure it is a leaf node 
           (let [value (hid->value hid)] 
             (and (string? value) 
             (or (zero? (count value)) ; empty string 
              (ts/whitespace? value)))))) ; all whitespace string 

blank-leaf-hids  (keep-if blank-leaf-hid? (all-hids)) 
>>     (apply remove-hid blank-leaf-hids) 
tree-2    (hid->hiccup root-hid) 

產生好得多的結果樹(打嗝格式)

[:data 
[:products 
    [:product 
    [:section "Red Section"] 
    [:images [:image "img.jpg"] [:image "img2.jpg"]]] 
    [:product 
    [:section "Blue Section"] 
    [:images [:image "img.jpg"] [:image "img3.jpg"]]] 
    [:product 
    [:section "Green Section"] 
    [:images [:image "img.jpg"] [:image "img2.jpg"]]]]] 

下面的代碼然後計算解答上述三個問題:

product-hids   (find-hids root-hid [:** :product]) 
product-trees-hiccup (mapv hid->hiccup product-hids) 

img2-paths   (find-paths-leaf root-hid [:data :products :product :images :image] "img2.jpg") 
img2-prod-paths  (mapv #(drop-last 2 %) img2-paths) 
img2-prod-hids  (mapv last img2-prod-paths) 
img2-trees-hiccup (mapv hid->hiccup img2-prod-hids) 

red-sect-paths  (find-paths-leaf root-hid [:data :products :product :section] "Red Section") 
red-prod-paths  (mapv #(drop-last 1 %) red-sect-paths) 
red-prod-hids  (mapv last red-prod-paths) 
red-trees-hiccup  (mapv hid->hiccup red-prod-hids)] 

帶結果:

(is= product-trees-hiccup 
    [[:product 
    [:section "Red Section"] 
    [:images 
     [:image "img.jpg"] 
     [:image "img2.jpg"]]] 
    [:product 
    [:section "Blue Section"] 
    [:images 
     [:image "img.jpg"] 
     [:image "img3.jpg"]]] 
    [:product 
    [:section "Green Section"] 
    [:images 
     [:image "img.jpg"] 
     [:image "img2.jpg"]]]]) 

(is= img2-trees-hiccup 
    [[:product 
    [:section "Red Section"] 
    [:images 
    [:image "img.jpg"] 
    [:image "img2.jpg"]]] 
    [:product 
    [:section "Green Section"] 
    [:images 
    [:image "img.jpg"] 
    [:image "img2.jpg"]]]]) 

(is= red-trees-hiccup 
    [[:product 
    [:section "Red Section"] 
    [:images 
    [:image "img.jpg"] 
    [:image "img2.jpg"]]]])))) 

完整例子可以發現in the forest-examples unit test