1
我想用scrapy蜘蛛從以下網站的所有帖子獲取數據(問題標題+內容&答案)的第二級數據:如何使用Scrapy抓取頁面上的
問題是我只是不知道如何使它首先按照帖子的鏈接,然後抓取所有15個帖子/網站的數據。
{進口scrapy
類ArticleSpider(scrapy.Spider): 名= 「POST」 start_urls = [ 'https://forums.att.com/t5/Data-Messaging-Features-Internet/Throttling-for-unlimited-data/m-p/4805201#M73235']
def parse(self, response):
SET_SELECTOR = 'body'
for post in response.css(SET_SELECTOR):
# Selector for title, content and answer
TITLE_SELECTOR = '.lia-message-subject h5 ::text'
CONTENT_SELECTOR = '.lia-message-body-content'
ANSWER_SELECTOR = '.lia-message-body-content'
yield {
# [0].extract() = extract_first()
'Qtitle': post.css(TITLE_SELECTOR)[0].extract(),
'Qcontent': post.css(CONTENT_SELECTOR)[0].extract(),
'Answer': post.css(ANSWER_SELECTOR)[1].extract(),
}
# Running through all 173 pages
NEXT_PAGE_SELECTOR = '.lia-paging-page-next a ::attr(href)'
next_page = response.css(NEXT_PAGE_SELECTOR).extract_first()
if next_page:
yield scrapy.Request(
response.urljoin(next_page),
callback=self.parse
)}
我希望你能幫助我。提前致謝!