2
我正在嘗試獲取Google Play商店的最新評論。我正在關注這個問題以獲得最新評論here使用scrapy抓取動態內容
上述鏈接的答案中指定的方法可以很好地與scrapy shell一起工作,但是當我在我的爬蟲程序中嘗試這種方法時,它完全被忽略。
代碼片段:
import re
import sys
import time
import urllib
import urlparse
from scrapy import Spider
from scrapy.spider import BaseSpider
from scrapy.http import Request, FormRequest
from scrapy.contrib.spiders import CrawlSpider, Rule
from scrapy.contrib.linkextractors.lxmlhtml import LxmlLinkExtractor
from play.items import PlayApp
class PlaySpider(CrawlSpider):
name = "play"
allowed_domains = ["play.google.com"]
start_urls = [
"https://play.google.com/store/apps"
]
rules = (
Rule(LxmlLinkExtractor(allow=('/store/apps$',)), callback='parseCategory',follow=True),
)
def parseCategory(self, response):
"""
gets categories from store home page call parseLinks for each category
"""
#something here......
yield Request(categoryapps, callback=self.parseLinks)
def parseLinks(self, response):
'''
get all the links from the category page and then
pasess individual links to parseApp function.
'''
#something here
yield Request(link, callback=self.parseApp)
def parseApp(self, response):
'''
parses apps page to get info about the app
'''
#application page parsing ......
frmdata = {"id": "com.supercell.boombeach", "reviewType": '0', "reviewSortOrder": '0', "pageNum":'0'}
url = "https://play.google.com/store/getreviews"
yield FormRequest(url, callback=self.parse_data, formdata=frmdata)
yield app
def parse_data(self, response):
# do stuff with data...
print '\n\n---------------I am here------------------\n\n'
此功能parse_data永遠不會被調用。在#scrapy IRC和其他幾個地方問這個問題,但沒有幫助。請幫我解決一下這個。
這是對終端DEBUG響應:
DEBUG: Crawled (200) <POST https://play.google.com/store/getreviews> (referer: https://play.google.com/store/apps/details?id=isoft.studios.ncert.ncertbooks)
2015-06-03 13:56:07+0530 [play] DEBUG: Crawled (200) <POST https://play.google.com/store/getreviews> (referer: https://play.google.com/store/apps/details?id=af.hindi.stories.booktwo)
2015-06-03 13:56:07+0530 [play] DEBUG: Crawled (200) <POST https://play.google.com/store/getreviews> (referer: https://play.google.com/store/apps/details?id=com.frozenex.latestnewsms)
2015-06-03 13:56:07+0530 [play] DEBUG: Crawled (200) <POST https://play.google.com/store/getreviews> (referer: https://play.google.com/store/apps/details?id=com.aqua.apps.english.hindi.dictionary)
2015-06-03 13:56:07+0530 [play] DEBUG: Crawled (200) <POST https://play.google.com/store/getreviews> (referer: https://play.google.com/store/apps/details?id=com.merriamwebster)
2015-06-03 13:56:08+0530 [play] DEBUG: Crawled (200) <POST https://play.google.com/store/getreviews> (referer: https://play.google.com/store/apps/details?id=an.HindiTranslate)
所以POST請求確實越來越發送,但是回調方法不會被調用。
數據確實程序控制是否達到在'parseApp()'後的試樣評論? – Jithin
是的,從這裏獲得應用數據並將其存儲在mongoDB中。 –
你在這裏錯過了'id' – Jithin