分頁使用scrapy

我想抓取這個網站： http://www.aido.com/eshop/cl_2-c_189-p_185/stationery/pens.html 分頁使用scrapy

我可以得到所有的產品在這個頁面，但我怎麼發出在頁面底部的「查看更多」鏈接請求？

我的代碼到目前爲止是：

rules = (
    Rule(SgmlLinkExtractor(restrict_xpaths='//li[@class="normalLeft"]/div/a',unique=True)), 
    Rule(SgmlLinkExtractor(restrict_xpaths='//div[@id="topParentChilds"]/div/div[@class="clm2"]/a',unique=True)), 
    Rule(SgmlLinkExtractor(restrict_xpaths='//p[@class="proHead"]/a',unique=True)), 
    Rule(SgmlLinkExtractor(allow=('http://[^/]+/[^/]+/[^/]+/[^/]+$',), deny=('/about-us/about-us/contact-us', './music.html', ) ,unique=True),callback='parse_item'), 
)

任何幫助嗎？

來源

2013-04-21 Vanddel

首先，你應該看看這個線程如何應對刮AJAX動態加載的內容： Can scrapy be used to scrape dynamic content from websites that are using AJAX?

所以，點擊「查看更多」按鈕觸發了一個XHR請求：

http://www.aido.com/eshop/faces/tiles/category.jsp?q=&categoryID=189&catalogueID=2&parentCategoryID=185&viewType=grid&bnm=&atmSize=&format=&gender=&ageRange=&actor=&director=&author=&region=&compProductType=&compOperatingSystem=&compScreenSize=&compCpuSpeed=&compRam=&compGraphicProcessor=&compDedicatedGraphicMemory=&mobProductType=&mobOperatingSystem=&mobCameraMegapixels=&mobScreenSize=&mobProcessor=&mobRam=&mobInternalStorage=&elecProductType=&elecFeature=&elecPlaybackFormat=&elecOutput=&elecPlatform=&elecMegaPixels=&elecOpticalZoom=&elecCapacity=&elecDisplaySize=&narrowage=&color=&prc=&k1=&k2=&k3=&k4=&k5=&k6=&k7=&k8=&k9=&k10=&k11=&k12=&startPrize=&endPrize=&newArrival=&entityType=&entityId=&brandId=&brandCmsFlag=&boutiqueID=&nmt=&disc=&rat=&cts=empty&isBoutiqueSoldOut=undefined&sort=12&isAjax=true&hstart=24&targetDIV=searchResultDisplay

它返回下一個24項中的text/html。注意這個hstart=24參數：第一次點擊「查看更多」它等於24，第二次 - 48等等。這應該是你的救星。

現在，你應該在蜘蛛中模擬這些請求。推薦的方法是實例化Scrapy的Request對象，在提取數據的位置提供回調。

希望有所幫助。

來源

2013-04-21 17:01:49 alecxe

這很有幫助，但如何「實例化scrapy的請求對象」的例子會更有幫助。 – SMPLGRP 2013-10-18 20:04:22

分頁使用scrapy

回答

相關問題