0
試圖使用Scrapy從搜索引擎中獲取基於關鍵字的文件列表。在搜索引擎上使用Scrapy在文件中使用關鍵字
下面是Scrapy輸出錯誤:
Redirecting (301) to <GET https://duckduckgo.com/?q=> from <GET https://www.duckduckgo.com/?q=>
2014-07-18 16:23:39-0500 [wnd] DEBUG: Crawled (200) <GET https://duckduckgo.com/?q=> (referer: None)
這裏是代碼:
import re
import os
import sys
import json
from scrapy.spider import Spider
from scrapy.selector import Selector
searchstrings = "wnd.config"
searchoutcome = "searchResults.json"
class wndSpider(Spider):
name = "wnd"
allowed_domains = ['google.com']
url_prefix = []
#start_urls = ['https://www.google.com/search?q=']
start_urls = ['https://www.duckduckgo.com/?q=']
for line in open(searchstrings, 'r').readlines():
url_prefix = start_urls[0] + line
#url = url_prefix[0] + line
#f = open(searchstrings
#start_urls = [url_prefix]
#for f in f.readlines():
#f.close()
def parse(self, response):
sel = Selector(response)
goog_search_list = sel.xpath('//h3/a/@href').extract()
#goog_search_list = [re.search('q=(.*&sa',n).group(1) for n in goog_search_list]
#if re.search('q=(.*)&sa',n)]
#title = sel.xpath('//title/text()').extract()
#if len(title)>0: title = tilstle[0]
#contents = sel.xpath('/html/head/meta[@name="description"] /@content').extract()
#if len(contents)>0: contents = contents[0]
## dump output
#with open(searchoutcome, "w") as outfile:
#json.dump(searchoutcome ,outfile, indent=4)
爲什麼你認爲這是一個錯誤?你會期待什麼?第一個輸出顯示301 HTTP代碼,這只是一個重定向。下一個是200,這是成功的。所以我沒有看到一個錯誤。這種行爲可能不是你所期待的,但你並沒有告訴我們你期待的是什麼。所以沒有機會提供幫助。 ;-) – Achim
'parse()'有錯誤的縮進級別。 – kev