2017-08-07 44 views
1

我插入一個scrapy項目類,我已經在items.py中定義到一個mongodb中,但我需要它插入該類的所有字段,以便它將這些字段添加到數據庫中空。上市類下的NamePrice將始終插入爲空,但我希望保持pipelines.py清潔,以便我可以輕鬆切換到其他項目。目前,如果我沒有將類的每個部分設置爲空字符串,那麼在插入到db時不會添加該部分。Initialze類(scrapy項目)與空字符串

我是否需要將每個成員初始化爲空字典?像Title = scrapy.Field({})

items.py

class Listing(scrapy.Item): 
    Title = scrapy.Field() 
    Address = scrapy.Field() 
    Price = scrapy.Field() 
    Name = scrapy.Field() 

pipelines.py

def process_item(self, item, spider): 

    # Price and Name will always be empty 
    item['Price'] = '' 
    item['Name'] = '' 
    self.collection.insert_one(dict(item)) 

回答

0

您可以使用scrapy的ItemLoader

from scrapy.loader import ItemLoader 
from scrapy.item import Item, Field 
class Listing(Item): 
    title = Field() 
    address = Field() 
    price = Field() 
    name = Field() 

class MyLoader(ItemLoader): 
    default_item_class = Listing 

然後:

loader = MyLoader(response=response) 
loader.add_xpath('title', '//some/xpath/that/finds/nothing') 
loader.load_item() 
# {'title': ['']}