2016-03-16 81 views
0

我做了一個管道後Scrapy返回的結果來分析

PARSE = 'api.parse.com' PORT = 443

但是,我無法找到在發佈數據的正確方法解析。因爲每次它在我的分析數據庫中創建未定義的對象。錯誤的

class Newscrawlbotv01Pipeline(object): 
    def process_item(self, item, spider): 
     for data in item: 
      if not data: 
       raise DropItem("Missing data!") 
     connection = httplib.HTTPSConnection(
      settings['PARSE'], 
      settings['PORT'] 
     ) 
     connection.connect() 
     connection.request('POST', '/1/classes/articlulos', json.dumps({item}), { 
     "X-Parse-Application-Id": "XXXXXXXXXXXXXXXX", 
     "X-Parse-REST-API-Key": "XXXXXXXXXXXXXXXXXXX", 
     "Content-Type": "application/json" 
    }) 
     log.msg("Question added to PARSE !", level=log.DEBUG, spider=spider) 
     return item 
     #self.collection.update({'url': item['url']}, dict(item), upsert=True) 

例子:

2016-03-16 20:13:19 [scrapy] ERROR: Error processing {'image': 'http://eedl.eodi.org/wp-content/uploads/sites/3/2016/01/Figaro.png', 
'language': 'FR', 
'publishedDate': u'2016-03-16T18:52:24+01:00', 
'publisher': 'Le Figaro', 
'theme': 'Actualites', 
'title': u'Interpellations Paris: \xable niveau de menace reste tr\xe8s \xe9lev\xe9\xbb selon Hollande', 
'url': u'http://www.lefigaro.fr/flash-actu/2016/03/16/97001-20160316FILWWW00315-interpellations-paris-la-menace-reste-tres-elevee-selon-hollande.php'} 
Traceback (most recent call last): 
    File "c:\python27\lib\site-packages\twisted\internet\defer.py", line 588, in _runCallbacks 
    current.result = callback(current.result, *args, **kw) 
    File "C:\Users\simon\Documents\NewsSwipe\PROTOTYPE\v0.1\NewsCrawlBotV0_1\NewsCrawlBotV0_1\pipelines.py", line 49, in process_item 
    connection.request('POST', '/1/classes/articlulos', json.dumps({data}), { 
    File "c:\python27\lib\json\__init__.py", line 243, in dumps 
    return _default_encoder.encode(obj) 
    File "c:\python27\lib\json\encoder.py", line 207, in encode 
    chunks = self.iterencode(o, _one_shot=True) 
    File "c:\python27\lib\json\encoder.py", line 270, in iterencode 
    return _iterencode(o, 0) 
    File "c:\python27\lib\json\encoder.py", line 184, in default 
    raise TypeError(repr(o) + " is not JSON serializable") 
TypeError: set(['theme']) is not JSON serializable 
+0

現在它與http://stackoverflow.com/questions/36045159/scrapy-pipeline-to-parse重複,也許你應該接受創建管道的答案,然後我們可以繼續你的管道問題 – eLRuLL

+0

對不起,但您如何接受答案?我是StackOverFlow中的新成員。哦,等一下我明白了 –

+0

我認爲你做到了,請記得像以前一樣離開問題。 – eLRuLL

回答

0

您需要使用Pipeline,將處理其process_item方法的所有輸出項,那裏你可以做任何你想要與該項目。

0

Scrapy有一個內置的用於JSON文件飼料出口國,所有你需要做的就是添加

-o example.json 

您scrapy命令行。見the docs here