2017-06-20 38 views
0

我有以下代碼,我需要導出的項目通過電子郵件發送給我,所以我可以看到新聞。我知道Scrapy - 1.4 - Email Docs,我似乎無法找到足夠的例子來完成我的代碼。電子郵件與Scrapy基礎

什麼是啓動此代碼的好方法?如果沒有,我可以指出一些例子嗎?

import scrapy 
import collections 

from collections import OrderedDict 
from scrapy.spiders import XMLFeedSpider 
from tickers.items import tickersItem 
class Spider(XMLFeedSpider): 
    name = "EmperyScraper" 
    allowed_domains = ["yahoo.com"] 
    start_urls = ('https://feeds.finance.yahoo.com/rss/2.0/headline?s=UNXL,UQM,URRE,UUUU,VBLT,VGZ,VKTX,VTGN,WINT,XGTI,XTNT,XXII,ZSAN',) 
    itertag = 'item' 

    def parse_node(self, response, node): 
     item = collections.OrderedDict() 
     item['Title'] = node.xpath(
      'title/text()').extract_first() 
     item['PublishDate'] = node.xpath(
      'pubDate/text()').extract_first() 
     item['Description'] = node.xpath(
      'description/text()').extract_first()  
     item['Link'] = node.xpath(
      'link/text()').extract_first() 
     yield item 

更新:我也在尋找到方法,以使其自動化!

編輯: 下面是我在pipelines.py文件中的代碼。當我運行這個腳本時,我得到了>>>y,就是這樣。真令人費解:

# -*- coding: utf-8 -*- 

# Define your item pipelines here 
# 
# Don't forget to add your pipeline to the ITEM_PIPELINES setting 
# See: http://doc.scrapy.org/en/latest/topics/item-pipeline.html 
import smtplib 
from smtplib import SMTP 
from email.mime.text import MIMEText 
from email.mime.multipart import MIMEMultipart 

class TickersPipeline(object): 
    def send_mail(self, message, title): 
     from email.MIMEMultipart import MIMEMultipart 
     from email.MIMEText import MIMEText 
     gmailUser = '[email protected]' 
     gmailPassword = 'example' 
     recipient = '[email protected]' 

     msg = MIMEMultipart() 
     msg['From'] = gmailUser 
     msg['To'] = recipient 
     msg['Subject'] = title 
     msg.attach(MIMEText(message)) 
     mailServer = smtplib.SMTP('smtp.gmail.com', 587) 

     mailServer.ehlo() 
     mailServer.starttls() 
     mailServer.ehlo() 
     mailServer.login(gmailUser, gmailPassword) 
     mailServer.sendmail(gmailUser, recipient, msg.as_string()) 
     mailServer.close() 
+0

除了https://stackoverflow.com/a/11411162/131187中提到的內容外,您還需要了解哪些內容? –

+0

它不起作用。出於某種原因,當我使用類似的代碼時,我可以刮,但我沒有收到任何電子郵件。 – Friezan

+0

我指的是從scrapy.mail導入MailSender開始的代碼。好,從小開始。我假設'parse_node''產生'有用的電子郵件項目。嘗試在'收益'之前的三行代碼中的最後兩行中,在'收件人'和'抄送'字段中使用自己的電子郵件地址(在列表中),其餘部分相同。走着瞧吧。如果它有效。嘗試將'item'中的東西推到'body'中,看看你得到了多少。 –

回答

0

下面是一個指南,從scrapy提供的一個基本教程拼湊在一起。

import scrapy 
from scrapy.crawler import CrawlerProcess 
import smtplib 


class QuotesSpider(scrapy.Spider): 
    name = "quotes" 

    def start_requests(self): 
     urls = [ 
      'http://quotes.toscrape.com/page/1/', 
      'http://quotes.toscrape.com/page/2/', 
     ] 
     for url in urls: 
      yield scrapy.Request(url=url, callback=self.parse) 

    def parse(self, response): 
     page = response.url.split("/")[-2] 
     filename = 'quotes-%s.html' % page 
     server = smtplib.SMTP(my_server, port=587) 
     server.starttls() 
     server.login(my_user, my_pswd) 
     server.sendmail(my_email, [my_email], filename) 
     server.quit() 

process = CrawlerProcess({ 
    'USER_AGENT': 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)' 
}) 

process.crawl(QuotesSpider) 
process.start()