這裏是我的基本scrapy履帶:嵌套JSON項目進行scrapy
def parse(self, response):
item = CruiseItem()
item['Cruise'] = {}
item['Cruise']['Cruiseline'] = response.xpath('//title/text()').extract()
item['Cruise']['Itinerary'] = response.xpath('//*[@id="brochureName1"]/text()').extract()
item['Cruise']['Price'] = response.xpath('//*[@id="interiorPrice1"]/text()').extract()
item['Cruise']['PerNight'] = response.xpath('//*[@id="perNightinteriorPrice1"]/text()').extract()
return item
這適用於所有我想正確的元素拉大。我的例如JSON提要原來以下:
[
{
"Cruise": {
"Cruiseline": [
"Ship Name"
],
"Itinerary": [
"3 Night Bahamas ",
"4 Night Western Caribbean ",
"4 Night Bahamas ",
"3 Night Bahamas ",
"5 Night Western Caribbean ",
"5 Night Eastern Caribbean ",
"7 Night Western Caribbean ",
"7 Night Southern Caribbean ",
"6 Night Western Caribbean ",
"7 Night Western Caribbean ",
"8 Night Eastern Caribbean "
],
"Price": [
"$169",
"$179",
"$289",
"$349",
"$359",
"$389",
"$389",
"$409",
"$424",
"$524",
"$939"
],
"PerNight": [
"$56/night",
"$45/night",
"$72/night",
"$116/night",
"$72/night",
"$78/night",
"$56/night",
"$58/night",
"$71/night",
"$75/night",
"$117/night"
]
}
}
]
目標JSON輸出卻不同:
[
{
"Cruise": {
"Cruiseline": [
"Ship Name"
],
"Itinerary": [
"3 Night Bahamas "
],
"Price": [
"$169"
],
"PerNight": [
"$56/night"
]
},
"Cruise": {
"Cruiseline": [
"Ship Name"
],
"Itinerary": [
"4 Night Bahamas "
],
"Price": [
"$79"
],
"PerNight": [
"$86/night"
]
}
}
]
基本上我想,只有每個船,行程,價格的1回報每巡航路線,並每晚。
這是否有意義?很想討論
編輯:前幾天問過這個問題,但決定澄清並重新發布。謝謝!
感謝這一點,即時通信開放的嘗試你的想法,我可能需要一些幫助,再加工腳本但是 – Nathan
這並不能真正幫助,我不知道這個更新的代碼應實行 – Nathan
嗯,我如果我看不到您的整個代碼庫,那麼無法真正告訴您代碼應該放在哪裏。我假設'parse'在某處被多次調用,因爲到目前爲止您的最終數據是一個數組。所以基本上,找到你的json feed存儲在哪個變量中 - 比如說它叫做cruise_list,然後粘貼我的代碼。 (我的代碼只有在你調用你的數據變量'cruise_list'時才能工作,所以如果你的數據變量被稱爲'x',那麼在你的數據被聚合之前做一些類似'cruise_list = x'的事情,或者用'cruise_list ' – mjkaufer