2014-04-29 84 views
0

我是heroku pg的新聞。我在這裏做的是我寫了一個scrapy crawler,它運行時沒有任何錯誤。問題是我想把所有抓取的數據放到我的heroku postgres數據庫中。爲此,我稍微遵循this tutorial從scrapy spider載入heroku pg數據庫中的抓取數據。

當我使用scrapy crawl spidername在本地機器上運行爬網程序時,它運行成功,但未插入抓取的數據既沒有在heroku數據庫上創建任何表。我甚至沒有在本地終端上發生任何錯誤。這是我的代碼是什麼?

settings.py

BOT_NAME = 'crawlerconnectdatabase' 

SPIDER_MODULES = ['crawlerconnectdatabase.spiders'] 
NEWSPIDER_MODULE = 'crawlerconnectdatabase.spiders' 

DATABASE = {'drivername': 'postgres', 
     'host': 'ec2-54-235-250-41.compute-1.amazonaws.com', 
     'port': '5432', 
     'username': 'dtxwjcycsaweyu', 
     'password': '***', 
     'database': 'ddcir2p1u2vk07'} 

items.py

from scrapy.item import Item, Field 

class CrawlerconnectdatabaseItem(Item): 
    name = Field() 
    url = Field() 
    title = Field() 
    link = Field() 
    page_title = Field() 
    desc_link = Field() 
    body = Field() 
    news_headline = Field() 
    pass 

models.py

from sqlalchemy import create_engine, Column, Integer, String 
from sqlalchemy.ext.declarative import declarative_base 
from sqlalchemy.engine.url import URL 
import settings 

DeclarativeBase = declarative_base() 


def db_connect(): 

    return create_engine(URL(**settings.DATABASE)) 


def create_deals_table(engine): 

    DeclarativeBase.metadata.create_all(engine) 


class Deals(DeclarativeBase): 
"""Sqlalchemy deals model""" 
    __tablename__ = "news_data" 

    id = Column(Integer, primary_key=True) 
    body = Column('body', String) 

pipelines.py

from sqlalchemy.orm import sessionmaker 
from models import Deals, db_connect, create_deals_table 

class CrawlerconnectdatabasePipeline(object): 

    def __init__(self): 
     engine = db_connect() 
     create_deals_table(engine) 
     self.Session = sessionmaker(bind=engine) 

    def process_item(self, item, spider): 
     session = self.Session() 
     deal = Deals(**item) 

     try: 
      session.add(deal) 
      session.commit() 
     except: 
      session.rollback() 
      raise 
     finally: 
      session.close() 

     return item 

蜘蛛

爲scrapy蜘蛛的代碼,你會發現它here

回答

0

你需要添加ITEM_PIPELINES = { 'crawlerconnectdatabase.pipelines.CrawlerconnectdatabasePipeline':300 ,} to your settings.py

+0

好的我會嘗試.... –