我試圖抓取網站抓取特定的XPath,從產品頁面我試圖報廢產品的說明,而是如何選擇只產品說明:如何限制蜘蛛使用scrapy
xPath : hxs.select('//div[@class="product-shop"]/p/text()').extract()
的HTML是相當大的,所以請參見上面指定的鏈接..
我想只需要選擇產品說明中沒有其他細節...
如果我這樣做:
[" ".join([i.strip() for i in hxs.select('//div[@class="product-shop"]/p/text()').extract()])]
output :
[u'Itemcode: 12BTS28271 Brand: BASICS InStock - Ships within 2 business days. Tip: 90% of our shipments reach within 4 business days! This product is part of the Basics T.shirts line made of 100% Cotton. Stripes Muscle Fit T.shirts that come in Green Color. Casual that comes with Henley away.']
但我只想:在鍍鉻元素面板中的元素
[u'This product is part of the Basics T.shirts line made of 100% Cotton. Stripes Muscle Fit T.shirts that come in Green Color. Casual that comes with Henley away.']
是否有任何正則表達式或東西,以避免不必要的xPath –