掙扎的XPath建立

我試圖建立的XPath本頁：我想刮掙扎的XPath建立

項是品牌，型號和價格分別爲所有的智能手機，照片，如圖：

enter image description here

但是我努力建立有效的主要的XPath。試圖測試幾個xpath，完成這一個：

sel.xpath('//div[@style="position: relative;"]').extract()

但沒有成功。

對此有何暗示？

來源

2014-03-31 user3404005

對於品牌和型號名稱，使用class屬性名稱：

//div[@class="m_product_title_div"]/text()

對於您可以檢查價格id屬性：

//div[@id="m_product_price_div"]/text()

在Chrome控制檯測試了這些XPath表達式（使用$x('xpath_here')句法）。

您可能需要使這些xpath表達式相對於手機特定的塊（.//div[@class="m_product_title_div"]/text()）以及strip()前導和尾隨空格和換行符。

UPD（蜘蛛用於抓取的品牌，名稱和價格）：

from scrapy.item import Item, Field 
from scrapy.spider import BaseSpider 
from scrapy.selector import Selector 


class MobiItem(Item): 
    brand = Field() 
    title = Field() 
    price = Field() 


class MobiSpider(BaseSpider): 
    name = "mobi" 
    allowed_domains = ["mobi.ge"] 
    start_urls = [ 
     "http://mobi.ge/?page=products&category=60" 
    ] 

    def parse(self, response): 
     sel = Selector(response) 
     blocks = sel.xpath('//table[@class="m_product_previews"]/tr/td/a') 
     for block in blocks: 
      item = MobiItem() 
      try: 
       item["brand"] = block.xpath(".//div[@class='m_product_title_div']/span/text()").extract()[0].strip() 
       item["title"] = block.xpath(".//div[@class='m_product_title_div']/span/following-sibling::text()").extract()[0].strip() 
       item["price"] = block.xpath(".//div[@id='m_product_price_div']/text()").extract()[0].strip() 
       yield item 
      except: 
       continue

抓取：

{'brand': u'Samsung', 'price': u'695 GEL', 'title': u'G7102 Grand dous 2'} 
{'brand': u'Samsung', 'price': u'572 GEL', 'title': u'I9060 Galaxy grand...'} 
...

來源

2014-03-31 22:34:39 alecxe

感謝您的回覆。 '（.// div [@ class =「m_product_title_div」]/text（））' 返回空字符串。 – user3404005

@ user3404005你能展示你的蜘蛛的相關代碼嗎？以便我可以調試該問題。僅供參考，xpaths在Chrome控制檯中工作。 – alecxe

由於主要的xpath問題，尚未構建蜘蛛（儘管之前我已經構建了它們中的幾個）。 – user3404005

選擇使用XPath表達式//div[@class="m_product_preview_div]的所有產品。現在，每個從產品的上下文中運行那些XPath查詢時間循環，那麼，象上述取：

./div[@class="m_product_title_div"]/span[@class="like_link"]/text()爲供應商（給定它鏈接）
./div[@class="m_product_title_div"]/text()爲產品名稱
./div[@id="m_product_price_div"]/text()爲價格

您將非常想在之後修剪空白。雖然這可以使用XPath和normalize-space(...)，但我可能會用Python做到這一點。

來源

2014-04-01 08:01:32

謝謝。 'sel。xpath（'// div [@ class =「m_product_preview_div」]'）。extract（）'在終端測試時返回空弦。 – user3404005

掙扎的XPath建立

回答

相關問題