2017-10-08 25 views
0

我想保存我的數據,編輯它,然後再次保存爲同一個對象(它是一個字典?)。scrapy如何屈服於現有物體mongodb

我用yield {'Id':id,'Name':name,'Age':age}將數據保存爲mongodb。

之後,我用下面的代碼讀出的數據:

import scrapy 
import pymongo 
from pymongo import MongoClient 

class example(scrapy.Spider): 
    name = 'example' 
    allowed_domains = ['example.com'] 
    start_urls = ['https://example.com'] 

def __init__(self): 
    self.db = MongoClient() 
    self.datab = self.db.database_name.collection_name.find({}) 


def parse(self, response): 
    for data in self.datab: 
     name = data['Name'] 
     print(name) 

上面的代碼將打印在數據的基礎上的所有名稱。 但例如如果我想編輯的名字是這樣的:

for data in self.datab: 
     name = data['Name'] 
     if name == 'dani': 
      name = 'daniel' 
      yield{'Name':name} 

我希望它是產生於同一個對象以前。

~~~~~~~~~~~~~~~~~~~~~~

編輯: pipelines.py:

import pymongo 
from pymongo import MongoClient 
from scrapy.conf import settings 

class MongoDBPipeline(object): 
def __init__(self): 
    connection = MongoClient(settings['MONGODB_SERVER'], settings['MONGODB_PORT']) 
    db = connection[settings['MONGODB_DB']] 
    self.collection = db[settings['MONGODB_COLLECTION']] 

def process_item(self, item, spider): 
    self.collection.insert(dict(item)) 
    return item 

settings.py:

ITEM_PIPELINES = { 
    'quotes_spider.pipelines.MongoDBPipeline': 300, 
} 
MONGODB_SERVER = 'localhost' 
MONGODB_PORT = 27017 
MONGODB_DB = 'database_name' 
MONGODB_COLLECTION = 'collection_name' 
+0

您是否使用管道來保存這個?如果是這樣,後置管道代碼也加入 –

+0

@Tarun Lalwani – daniel

回答

1

您需要更改

self.collection.insert(dict(item)) 

if "_id" in item: 
    _id = item.pop("_id") 
    self.collection.update_one({"_id":_id}, {"$set": item}) 
else: 
    self.collection.insert(dict(item)) 

而且這

for data in self.datab: 
    name = data['Name'] 
    if name == 'dani': 
     name = 'daniel' 
     yield {'_id': data['_id'], 'Name': name} 

所以基本上,當你想更新你會產生_id和更新的領域。當你想要插入時,你將產生沒有_id