這是我的代碼,但它似乎是正確的,但它不能正常工作,請大家幫忙這裏有什麼錯誤?
HEADER_XPATH = ['//h1[@class="story-body__h1"]//text()']
AUTHOR_XPATH = ['//span[@class="byline__name"]//text()']
PUBDATE_XPATH = ['//div/@data-datetime']
WTAGS_XPATH = ['']
CATEGORY_XPATH = ['//span[@rev="news|source""]//text()']
TEXT = ['//div[@property="articleBody"]//p//text()']
INTERLINKS = ['//div[@class="story-body__link"]//p//a/@href']
DATE_FORMAT_STRING = '%Y-%m-%d'
class BBCSpider(Spider):
name = "bbc"
allowed_domains = ["bbc.com"]
sitemap_urls = [
'http://Www.bbc.com/news/sitemap/',
'http://www.bbc.com/news/technology/',
'http://www.bbc.com/news/science_and_environment/']
def parse_page(self, response):
items = []
item = ContentItems()
item['title'] = process_singular_item(self, response, HEADER_XPATH, single=True)
item['resource'] = urlparse(response.url).hostname
item['author'] = process_array_item(self, response, AUTHOR_XPATH, single=False)
item['pubdate'] = process_date_item(self, response, PUBDATE_XPATH, DATE_FORMAT_STRING, single=True)
item['tags'] = process_array_item(self, response, TAGS_XPATH, single=False)
item['category'] = process_array_item(self, response, CATEGORY_XPATH, single=False)
item['article_text'] = process_article_text(self, response, TEXT)
item['external_links'] = process_external_links(self, response, INTERLINKS, single=False)
item['link'] = response.url
items.append(item)
return items
問題是什麼?也許可以解釋問題是什麼?輸入?慾望輸出?你在做什麼? – MooingRawr
問題是,當我運行我的代碼時,什麼都沒有發生。它不通過頁面!我認爲我的錯誤是在變數@MooingRawr – nik