的XPath使用的libxml2

在與libxml2的iOS應用中提取與多個標籤的文本，在iOS上，在解析這個HTML件（這是一個大頁面的一部分） -的XPath使用的libxml2

... 
<span class="ingredient"> 
    <span class="amount"> 
     <span class="value">500 </span> 
     <span class="type">g</span> 
    </span>  
    <a href="...">bread flour</a> 
    or 
    <span class="ingredient"> 
     <span class="amount"> 
      <span class="value">500 </span> 
      <span class="type">g</span> 
     </span> 
     <span class="name"> 
      <a href="...">all-purpose flour</a> 
     </span> 
    </span> 
</span> 
...

我需要提取只有文字：「500克麪粉或500克通用麪粉」。

//span[@class="ingredient"] XPath查詢的解析的NSDictionary返回結果 -

{ 
    nodeAttributeArray =  (
       { 
      attributeName = class; 
      nodeContent = ingredient; 
     } 
    ); 
    nodeChildArray =  (
       { 
      nodeAttributeArray =    (
           { 
        attributeName = class; 
        nodeContent = amount; 
       } 
      ); 
      nodeChildArray =    (
           { 
        nodeAttributeArray =      (
               { 
          attributeName = class; 
          nodeContent = value; 
         } 
        ); 
        nodeContent = 500; 
        nodeName = span; 
       }, 
           { 
        nodeAttributeArray =      (
               { 
          attributeName = class; 
          nodeContent = type; 
         } 
        ); 
        nodeContent = g; 
        nodeName = span; 
       } 
      ); 
      nodeContent = ""; 
      nodeName = span; 
     }, 
       { 
      nodeAttributeArray =    (
           { 
        attributeName = href; 
        nodeContent = "http://www.food.com/library/flour-64"; 
       } 
      ); 
      nodeContent = "bread flour"; 
      nodeName = a; 
     }, 
       { 
      nodeAttributeArray =    (
           { 
        attributeName = class; 
        nodeContent = ingredient; 
       } 
      ); 
      nodeChildArray =    (
           { 
        nodeAttributeArray =      (
               { 
          attributeName = class; 
          nodeContent = amount; 
         } 
        ); 
        nodeChildArray =      (
               { 
          nodeAttributeArray =        (
                   { 
            attributeName = class; 
            nodeContent = value; 
           } 
          ); 
          nodeContent = 500; 
          nodeName = span; 
         }, 
               { 
          nodeAttributeArray =        (
                   { 
            attributeName = class; 
            nodeContent = type; 
           } 
          ); 
          nodeContent = g; 
          nodeName = span; 
         } 
        ); 
        nodeContent = ""; 
        nodeName = span; 
       }, 
           { 
        nodeAttributeArray =      (
               { 
          attributeName = class; 
          nodeContent = name; 
         } 
        ); 
        nodeChildArray =      (
               { 
          nodeAttributeArray =        (
                   { 
            attributeName = href; 
            nodeContent = "http://www.food.com/library/flour-64"; 
           } 
          ); 
          nodeContent = "all-purpose flour"; 
          nodeName = a; 
         } 
        ); 
        nodeContent = ""; 
        nodeName = span; 
       } 
      ); 
      nodeContent = ""; 
      nodeName = span; 
     } 
    ); 
    nodeContent = or; 
    nodeName = span; 
}

的問題是，在字典根「nodeContent」是文本「或」，且所有的標籤都坐在作爲根節點的孩子，所以這些作品的順序已經丟失了 - 我無法分辨或實際上處於中間，而在所有文本的連續字符中，我會得到以下字符串：「或500克麪包粉500克通用麪粉」。

任何人都可以找出一種方法來提取1個XPath查詢中的純文本，或者使用XPath引擎來讀取一個有序的元素列表？

來源

2013-04-06 Kof

你正在寫一些查詢，但沒有顯示它。 :) – 2013-04-06 17:57:24

我不認爲這是相關的，但也許我錯了:) – Kof 2013-04-07 06:49:32

，因爲你需要所有文本節點，這可以很容易地使用

//text()

將返回所有節點完成。有一些問題，在您的內容空白，你可以ommit所有空白，只節點可以使用

//text()[not(matches(., '$[\s]+$', 'm'))]

您仍然需要做一些微調（例如「G」）的目標C之後，但你應該獲取包含可打印字符的所有文本節點的有序結果集。

來源

2013-04-06 17:56:58

給定的HTML只是整個頁面的一部分，在這種情況下獲取所有文本元素將無濟於事。我會更新我的問題。謝謝！ – Kof 2013-04-07 06:47:02

但這是一個很好的主角 - 使用// span // text（）它返回了正確排列的所有文本元素。 – Kof 2013-04-07 06:54:36

的XPath使用的libxml2

回答

相關問題