Scrapy無法抓取鏈接 - vnexpress網站的評論

我是Scrapy的新手& Python。我嘗試從以下網址的評論，但結果總是空：http://vnexpress.net/tin-tuc/oto-xe-may/toyota-camry-2016-dinh-loi-tui-khi-khong-bung-3386676.html Scrapy無法抓取鏈接 - vnexpress網站的評論

這裏是我的代碼：

from scrapy.spiders import Spider 
from scrapy.selector import Selector 
from tutorial.items import TutorialItem 

import logging 

class TutorialSpider(Spider): 
    name = "vnexpress" 
    allowed_domains = ["vnexpress.net"] 
    start_urls = [ 
     "http://vnexpress.net/tin-tuc/oto-xe-may/toyota-camry-2016-dinh-loi-tui-khi-khong-bung-3386676.html" 
    ] 

    def parse(self, response): 
     sel = Selector(response) 
     commentList = sel.xpath('//div[@class="comment_item"]') 
     items = [] 
     id = 0; 

     logging.log(logging.INFO, "TOTAL COMMENT : " + str(len(commentList))) 

     for comment in commentList: 
      item = TutorialItem() 

      id = id + 1 

      item['id'] = id 
      item['mainId'] = 0 
      item['user'] = comment.xpath('//span[@class="left txt_666 txt_11"]/b').extract() 
      item['time'] = 'N/A' 
      item['content'] = comment.xpath('//p[@class="full_content"]').extract() 
      item['like'] = comment.xpath('//span[@class="txt_666 txt_11 right block_like_web"]/a[@class="txt_666 txt_11 total_like"]').extract() 

      items.append(item) 

     return items

感謝您閱讀

來源

2016-05-12 Valentine Heartilly

貌似評論加載到頁面一些JavaScript代碼。

Scrapy不會在頁面上執行JavaScript，它只會下載HTML頁面。嘗試在瀏覽器中禁用JavaScript的情況下打開頁面，並且您應該看到Scrapy看到的頁面。

你有一些選項：

反向工程的意見是如何加載到頁面，使用瀏覽器的開發者工具面板，在「網絡」選項卡（也可能是一些XHR調用加載HTML或JSON數據）
使用（無頭）瀏覽器呈現頁面（硒，casper.js，splash ...）;
- 例如您可能需要使用Splash（網頁抓取的JavaScript渲染選項之一）來嘗試此頁面。這是你從閃回HTML（它包含註釋）：http://pastebin.com/njgCsM9w

來源

2016-05-12 11:00:20

感謝您的幫助。我會嘗試。 –

Scrapy無法抓取鏈接 - vnexpress網站的評論

回答

相關問題