0
平臺:Debian的8 +蟒蛇3.4 + Scrapy 1.3.2 這裏是我的蜘蛛下載某些URL形成yahoo.com爲什麼錯誤信息不能記錄到指定的文件中?
import scrapy
import csv
class TestSpider(scrapy.Spider):
name = "quote"
allowed_domains = ["yahoo.com"]
start_urls = ['url1','url2','url3',,,,'urls100']
def parse(self, response):
filename = response.url.split("=")[1]
open('/tmp/'+filename+'.csv', 'wb').write(response.body)
某些錯誤信息出現時,執行它:
2017-02-19 21:28:27 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response
<404 https://chart.yahoo.com/table.csv?s=GLU>: HTTP status code is not handled or not allowed
https://chart.yahoo.com/table.csv?s=GLU是start_urls之一。
現在我想抓住錯誤信息。
import scrapy
import csv
import logging
from scrapy.utils.log import configure_logging
configure_logging(install_root_handler=False)
logging.basicConfig(
filename='/tmp/log.txt',
format='%(levelname)s: %(message)s',
level=logging.INFO
)
class TestSpider(scrapy.Spider):
name = "quote"
allowed_domains = ["yahoo.com"]
start_urls = ['url1','url2','url3',,,,'url100']
def parse(self, response):
filename = response.url.split("=")[1]
open('/tmp/'+filename+'.csv', 'wb').write(response.body)
爲什麼該錯誤信息,如
2017年2月19日21時28分27秒[scrapy.spidermiddlewares.httperror] INFO:忽略響應 < https://chart.yahoo.com/table.csv?s=GLU 404>:HTTP狀態代碼沒有被處理或不允許 不能記錄在/home/log.txt中?
想到eLRuLL,我加了handle_httpstatus_list = [404]。
import scrapy
import csv
import logging
from scrapy.utils.log import configure_logging
configure_logging(install_root_handler=False)
logging.basicConfig(
filename='/home/log.txt',
format='%(levelname)s: %(message)s',
level=logging.INFO
)
class TestSpider(scrapy.Spider):
handle_httpstatus_list = [404]
name = "quote"
allowed_domains = ["yahoo.com"]
start_urls = ['url1','url2','url3',,,,'url100']
def parse(self, response):
filename = response.url.split("=")[1]
open('/tmp/'+filename+'.csv', 'wb').write(response.body)
錯誤信息仍然不能記錄到/home/log.txt文件中,爲什麼?