Facebook上的新圖搜索允許您使用查詢令牌搜索公司的當前員工 - 當前Google員工(例如)。使用scrapy將數據從Facebook中刮掉
我想通過scrapy報告結果頁面(http://www.facebook.com/search/104958162837/employees/present)。
最初的問題是Facebook只允許一個Facebook用戶訪問信息,所以引導我到login.php。所以,在抓取這個網址之前,我通過scrapy登錄了這個結果頁面。但即使此頁面的http響應爲200,它也不會分割任何數據。代碼如下:
import sys
from scrapy.spider import BaseSpider
from scrapy.http import FormRequest
from scrapy.selector import HtmlXPathSelector
from scrapy.contrib.spiders import CrawlSpider, Rule
from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor
from scrapy.selector import HtmlXPathSelector
from scrapy.item import Item
from scrapy.http import Request
class DmozSpider(BaseSpider):
name = "test"
start_urls = ['https://www.facebook.com/login.php'];
task_urls = [query]
def parse(self, response):
return [FormRequest.from_response(response, formname='login_form',formdata={'email':'myemailid','pass':'myfbpassword'}, callback=self.after_login)]
def after_login(self,response):
if "authentication failed" in response.body:
self.log("Login failed",level=log.ERROR)
return
return Request(query, callback=self.page_parse)
def page_parse(self,response):
hxs = HtmlXPathSelector(response)
print hxs
items = hxs.select('//div[@class="_4_yl"]')
count = 0
print items
我可能錯過了什麼或做錯了嗎?
在此先感謝。