2012-06-25 24 views
21

我正在用複雜的分層信息來抓取一些數據,並需要將結果導出到json。如何在scrapy中實現嵌套項目?

我定義的項目爲

class FamilyItem(): 
    name = Field() 
    sons = Field() 

class SonsItem(): 
    name = Field() 
    grandsons = Field() 

class GrandsonsItem(): 
    name = Field() 
    age = Field() 
    weight = Field() 
    sex = Field() 

和蜘蛛運行完成後,我會得到一個印刷項目輸出像

{'name': 'Jenny', 
    'sons': [ 
      {'name': u'S1', 
      'grandsons': [ 
        {'name': u'GS1', 
        'age': 18, 
        'weight': 50 
        }, 
        { 
        'name':u'GS2', 
        'age': 19, 
        'weight':51}] 
        }] 
} 

但是當我運行scrapy crawl myscaper -o a.json,它總是結果「說不是JSON可序列化「。然後我將項目輸出複製並粘貼到ipython控制檯並使用json.dumps(),它工作正常。那麼問題出在哪裏?這是在推動我的堅果......

回答

2

不知道是否有辦法在類中拼寫嵌套的項目,但數組工作正常。你可以做這樣的事情:

grandson['name'] = 'Grandson' 
grandson['age'] = 2 
gransons.append(grandson) 
son['name'] = 'Son' 
sons['grandson'] = grandsons 
sons.append(son) 
item.name = 'Name' 
item.son = sons 
21

當保存嵌套的項目,一定要包起來打電話與dict(),例如:

gs1 = GrandsonsItem() 
gs1['name'] = 'GS1' 
gs1['age'] = 18 
gs1['weight'] = 50 

gs2 = GrandsonsItem() 
gs2['name'] = 'GS2' 
gs2['age'] = 19 
gs2['weight'] = 51 

s1 = SonsItem() 
s1['name'] = 'S1' 
s1['grandsons'] = [dict(gs1), dict(gs2)] 

jenny = FamilyItem() 
jenny['name'] = 'Jenny' 
jenny['sons'] = [dict(s1)] 
+0

先生,你值得擁有一個cookie! –