Phantom 元素在Google電子表格中使用帶XPath的ImportXML

我試圖通過使用XPath的Google Spreadsheet中的importXML從this site中獲取元素屬性的值。Phantom 元素在Google電子表格中使用帶XPath的ImportXML

我尋找的屬性值是content，在中找到，itemprop="price"。

<div class="left" style="margin-top: 10px;"> 
    <meta itemprop="currency" content="RON"> 
     <span class="pret" itemprop="price" content="698,31 RON"> 
      <p class="pret">Pretul tau:</p> 
      698,31 RON 
     </span> 
... 
</div>

我可以訪問<div class="left">但我不能獲取到元素。

嘗試使用：

//span[@class='pret']/@content我得到＃N/A;
//span[@itemprop='price']/@content我得到了＃N/A;
//div[@class='left']/span[@class='pret' and @itemprop='price']/@content我得到＃N/A;
//div[@class='left']/span[1]/@content我得到＃N/A;
//div[@class='left']/span/text()得到文本節點我得到＃N/A;
//div[@class='left']//span/text()我得到了div.left一個的文本節點下。

要得到文本節點我必須使用//div[@class='left']/text()。但是我不能使用該文本節點，因爲如果產品正在銷售中，跨度的佈局會發生變化，所以我需要該屬性。

這就像我尋找的跨度不存在，雖然它出現在Chrome的開發視圖中，並且在頁面源代碼中和使用$x("")的控制檯中的所有XPath工作中。

我試圖直接通過右鍵單擊生成XPath的開發工具，我得到//*[@id='produs']/div[4]/div[4]/div[1]/span哪些不起作用。我也試圖用Firefox生成XPath，FF和Chrome的插件無濟於事。以這些方式生成的XPath甚至在我用「手動編碼的XPath」設法掃描的站點上都不起作用。

現在，最奇怪的是，在這個other site與代碼結構明顯相似的XPath //span[@itemprop='price']/@content的作品。

我現在掙扎了4天。我開始認爲這與自動關閉元標記有關，但爲什麼不在另一個網站上發生？

來源

2013-10-27 Macovei Vlad

也許下面的公式可以幫助你：

=ImportXML("http://...","//div[@class='product-info-price']//div[@class='left']/text()")

或者

=INDEX(ImportXML("http://...","//div[@class='product-info-price']//div[@class='left']"), 1, 2)

UPDATE

似乎不正確地分析整個文檔，它失敗。文檔提取，是這樣的：

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> 
<html> 
<div class="product-info-price"> 
    <div class="left" style="margin-top: 10px;"> 
     <meta itemprop="currency" content="RON"> 
     <span class="pret" itemprop="price" content="698,31 RON"> 
      <p class="pret">Pretul tau:</p> 
      698,31 RON 
     </span> 
     <div class="resealed-info"> 
      <a href="/resigilate/componente-pc/placi-de-baza/" rel="nofollow">» Vezi 1 resigilat din aceasta categorie</a> 
     </div> 
     <ul style="margin-left: auto;margin-right: auto;width: 200px;text-align: center;margin-top: 20px;"> 
      <li style="color: #000000; font-size: 11px;">Rata de la <b>28,18 RON</b> prin <a href="http://www.marketonline.ro/rate-sapte-stele?amount=698.31#brdfinance" title="BRD Finance" target="_blank" class="rate" rel="nofollow">BRD</a></li> 
      <li style="color: #5F5F5F;text-align: center;">Pretul include TVA</li> 
      <li style="color: #5F5F5F;">Cod produs: <span style="margin-left: 0;text-align: center;font-weight: bold;" itemprop="identifier" content="mol:GA-Z87X-UD3H">GA-Z87X-UD3H</span> </li> 
     </ul> 
    </div> 
    <div class="right" style="height: 103px;line-height: 103px;"> 
     <form action="/?a=shopping&amp;sa=addtocart" method="post" id="add_to_cart_form"> 
      <input type="hidden" name="product-183641" value="on"/> 
      <a href="/adaugaincos-183641" rel="nofollow"><img src="/templates/marketonline/images/pag-prod/buton_cumpara.jpg"/></a> 
     </form> 
    </div> 
</div> 
</html>

適用於以下XPath查詢：

"//div[@class='product-info-price']//div[@class='left']//span[@itemprop='price']/@content"

UPDATE

它發生，我認爲一個選項是，你可以使用Apps Script創建您自己的ImportXML功能，如：

/* CODE FOR DEMONSTRATION PURPOSES */ 
function MyImportXML(url) { 
    var found, html, content = ''; 
    var response = UrlFetchApp.fetch(url); 
    if (response) { 
    html = response.getContentText(); 
    if (html) content = html.match(/<span class="pret" itemprop="price" content="(.*)">/gi)[0].match(/content="(.*)"/i)[1]; 
    } 
    return content; 
}

然後你可以使用如下：

=MyImportXML("http://...")

來源

2013-10-27 08:57:13 wchiquito

感謝您的答案，但我需要span的屬性值，而不是文本節點。我不知道索引函數，它是非常有用的，但遺憾的是不是在這種情況下。 –

@MacoveiVlad也許最新的答案更新，可以幫助任何事情。 – wchiquito

非常感謝你爲myImportXml定製的功能，現在爲我解決了一個問題！ –

嘗試水木清華這樣的：

print 'content by key',tree.xpath('//*[@itemprop="price"]')[0].get('content')

或

nodes = tree.xpath('//div/meta/span') 
for node in nodes: 
    print 'content =',node.get('content')

但我還沒有嘗試過。

來源

2013-10-28 10:09:21

此時，在第一環節referred web page不包括span標記與itemprop =「價格」，但下面的XPath返回639

//b[@itemprop='price']

看來，我認爲這個問題是，元標記不符合XHTML標準，但現在所有元標記都已正確關閉。

前：

<meta itemprop="currency" content="RON">

現在

<meta itemprop="priceCurrency" content="RON" />

的網頁是不是符合XHTML，而不是IMPORTXML另一個解決方案應該使用，如使用IMPORTDATA和REGEXEXTRACT或谷歌Apps腳本時， UrlFetch服務和匹配JavasScript函數，以及其他選擇。

來源

2016-05-30 23:14:07

Phantom <span>元素在Google電子表格中使用帶XPath的ImportXML

回答

Phantom <span>元素在Google電子表格中使用帶XPath的ImportXML

回答

相關問題