2015-11-20 43 views
1

我是使用動態刮板的新手,並且我使用了以下示例學習open_news。我把一切都成立,但它讓我顯示了同樣的錯誤:dynamic_scraper.models.DoesNotExist: RequestPageType matching query does not exist.Django動態刮板無法刮取數據

2015-11-20 18:45:11+0000 [article_spider] ERROR: Spider error processing <GET https://en.wikinews.org/wiki/Main_page> 
Traceback (most recent call last): 
    File "/home/suz/social-network-sujit/local/lib/python2.7/site-packages/Twisted-15.4.0-py2.7-linux-x86_64.egg/twisted/internet/base.py", line 825, in runUntilCurrent 
    call.func(*call.args, **call.kw) 
    File "/home/suz/social-network-sujit/local/lib/python2.7/site-packages/Twisted-15.4.0-py2.7-linux-x86_64.egg/twisted/internet/task.py", line 645, in _tick 
    taskObj._oneWorkUnit() 
    File "/home/suz/social-network-sujit/local/lib/python2.7/site-packages/Twisted-15.4.0-py2.7-linux-x86_64.egg/twisted/internet/task.py", line 491, in _oneWorkUnit 
    result = next(self._iterator) 
    File "/home/suz/social-network-sujit/local/lib/python2.7/site-packages/scrapy/utils/defer.py", line 57, in <genexpr> 
    work = (callable(elem, *args, **named) for elem in iterable) 
--- <exception caught here> --- 
    File "/home/suz/social-network-sujit/local/lib/python2.7/site-packages/scrapy/utils/defer.py", line 96, in iter_errback 
    yield next(it) 
    File "/home/suz/social-network-sujit/local/lib/python2.7/site-packages/scrapy/contrib/spidermiddleware/offsite.py", line 26, in process_spider_output 
    for x in result: 
    File "/home/suz/social-network-sujit/local/lib/python2.7/site-packages/scrapy/contrib/spidermiddleware/referer.py", line 22, in <genexpr> 
    return (_set_referer(r) for r in result or()) 
    File "/home/suz/social-network-sujit/local/lib/python2.7/site-packages/scrapy/contrib/spidermiddleware/urllength.py", line 33, in <genexpr> 
    return (r for r in result or() if _filter(r)) 
    File "/home/suz/social-network-sujit/local/lib/python2.7/site-packages/scrapy/contrib/spidermiddleware/depth.py", line 50, in <genexpr> 
    return (r for r in result or() if _filter(r)) 
    File "/home/suz/social-network-sujit/local/lib/python2.7/site-packages/dynamic_scraper/spiders/django_spider.py", line 378, in parse 
    rpt = self.scraper.get_rpt_for_scraped_obj_attr(url_elem.scraped_obj_attr) 
    File "/home/suz/social-network-sujit/local/lib/python2.7/site-packages/dynamic_scraper/models.py", line 98, in get_rpt_for_scraped_obj_attr 
    return self.requestpagetype_set.get(scraped_obj_attr=soa) 
    File "/home/suz/social-network-sujit/local/lib/python2.7/site-packages/Django-1.8.5-py2.7.egg/django/db/models/manager.py", line 127, in manager_method 
    return getattr(self.get_queryset(), name)(*args, **kwargs) 
    File "/home/suz/social-network-sujit/local/lib/python2.7/site-packages/Django-1.8.5-py2.7.egg/django/db/models/query.py", line 334, in get 
    self.model._meta.object_name 
dynamic_scraper.models.DoesNotExist: RequestPageType matching query does not exist. 

回答

1

這是由「請求的頁面類型」造成丟失。 每個「SCRAPER ELEMS」必須有它自己的「請求頁面類型」。

爲了解決這個問題,請按照下列步驟操作:

  1. 登錄管理頁面(通常http://localhost:8000/admin/
  2. 轉到首頁> Dynamic_Scraper>鏟運機>維基刮板(條)
  3. 點擊「在「請求頁面類型」下添加另一個請求頁面類型「
  4. 爲每個」(base(Article))「,」(title(Article))「,」(description(Article) )「和」(url(Article))「

「請求的頁面類型」 設置

所有 「內容類型」 是 「HTML」

所有 「請求類型」 是 「請求」

所有 「方法」 是「獲取「

對於 」頁面類型「,只是爲它們分配順序一樣

(基地(文章))|主頁

(title(Article))|詳細信息頁1

(說明(文章)|詳細頁2

(URL(文章))|詳細頁3

經過以上步驟,你應該修正 「DoesNotExist:RequestPageType」 錯誤

然而,「ERROR:強制性ELEM標題失蹤」!會拿出

爲了解決這個問題,我建議你改變「SCRAPER elems的」到「主頁」,包括「稱號的所有「請求頁面類型」(第)「。

然後更改XPath如下:

(base(Article))| // td [@ class =「l_box」]

(title(Article))|跨度[@類= 「l_title」] /一個/ @標題

(描述(第)| P /跨度[@類= 「l_summary」] /文本()

(URL(第))| span [@ class =「l_title」]/a/@ href

畢竟,在命令提示符下運行scrapy crawl article_spider -a id=1 -a do_action=yes。 您應該能夠抓取「文章」。 您可以從首頁> Open_News>文章

享受〜