2017-06-14 26 views
0

對不起,如果我錯了使用標題。基本上我想放棄這個使用scrapy數據:Python廢品加入以下兄弟姐妹

<tr> 
 
    <td colspan=2> 
 
     <h4>Ottawa Macdonald-Cartier International Airport runways</h4> 
 
    </td> 
 
</tr> 
 
</tr> 
 
<tr class="odd"> 
 
    <td><a href="ottawa-macdonald-cartier-international-airport-runway-04-22-extended-info_R234949.html" title="Ottawa Macdonald-Cartier International Airport runway 04/22 extended info"><b>04/22</b></a></td> 
 
    <td>3300x75 <small>ft.</small></td> 
 
</tr> 
 
<tr class="even"> 
 
    <td><a href="ottawa-macdonald-cartier-international-airport-runway-07-25-extended-info_R234950.html" title="Ottawa Macdonald-Cartier International Airport runway 07/25 extended info"><b>07/25</b></a></td> 
 
    <td>8000x200 <small>ft.</small></td> 
 
</tr> 
 
<tr class="odd"> 
 
    <td><a href="ottawa-macdonald-cartier-international-airport-runway-14-32-extended-info_R234951.html" title="Ottawa Macdonald-Cartier International Airport runway 14/32 extended info"><b>14/32</b></a></td> 
 
    <td>10000x200 <small>ft.</small></td> 
 
</tr> 
 
<tr class=""> different repeat each page ....

我想以CSV行輸出成爲JSON格式。這般模樣:

{'05/23': '3281x250 ft.','18/36': '3252x250 ft.'} 

,但我總是得到這樣的:

{05/23,18/36,3281x250 ,ft.,3252x250 ,ft.} 

,這我的代碼:

def parse_details(self, response): 
    runway1 = response.xpath(".//tr[contains(.,'runways')]/following-sibling::tr[@class]//td/a[contains(@title,'runway')]//text()").extract() 
    runway2 = response.xpath(".//tr[contains(.,'runways')]/following-sibling::tr[@class]//td[contains(.,'ft.')]//text()").extract() 
    runway = runway1 + runway2 
    runways = ','.join(runway) 

    yield {'runways':'{'+runways+'}'} 

如何讓我的代碼可以解析像我想要的嗎?因爲我搜索這個網站上的所有教程,但仍然卡住。感謝

回答

0
key_1 = response.xpath('//tr[@class="odd"]//a/b/text()').extract_first() 
value_1 = response.xpath('//tr[@class="odd"]//td[2]/text()').extract_first() 

key_2 = response.xpath('//tr[@class="even"]//a/b/text()').extract_first() 
value_2 = response.xpath('//tr[@class="even"]//td[2]/text()').extract_first() 

yield {key_1: value_1, key_2: value_2} 
+0

您好..謝謝你anwser ..我真的很抱歉。已經編輯我的問題,忘記提及如果我得到不同的結果每頁..有時2列表奇數和偶數,有時5列表.. – Meganz

0

您可以循環在tr頭的兄弟姐妹,並獲得每個鍵/值:

In [1]: response = scrapy.Selector(text='''<tr> 
    ...:  <td colspan=2> 
    ...:   <h4>Ottawa Macdonald-Cartier International Airport runways</h4> 
    ...:  </td> 
    ...: </tr> 
    ...: </tr> 
    ...: <tr class="odd"> 
    ...:  <td><a href="ottawa-macdonald-cartier-international-airport-runway-04-22-extended-info_R234949.html" title="Ottawa Macdonald-Cartier International Airport runway 04/22 extended info"><b>04/22</b></a></td> 
    ...:  <td>3300x75 <small>ft.</small></td> 
    ...: </tr> 
    ...: <tr class="even"> 
    ...:  <td><a href="ottawa-macdonald-cartier-international-airport-runway-07-25-extended-info_R234950.html" title="Ottawa Macdonald-Cartier International Airport runway 07/25 extended info"><b>07/25</b></a></td> 
    ...:  <td>8000x200 <small>ft.</small></td> 
    ...: </tr> 
    ...: <tr class="odd"> 
    ...:  <td><a href="ottawa-macdonald-cartier-international-airport-runway-14-32-extended-info_R234951.html" title="Ottawa Macdonald-Cartier International Airport runway 14/32 extended info"><b>14/32</b></a></td> 
    ...:  <td>10000x200 <small>ft.</small></td> 
    ...: </tr>''') 



In [2]: {tr.xpath('string(.//td/a[contains(@title,"runway")])').get(): 
    ...:  tr.xpath('string(.//td[contains(.,"ft.")])').get() 
    ...: for tr in response.xpath('.//tr[contains(., "runways")]/following-sibling::tr[@class]') } 
    ...: 
Out[2]: 
{u'04/22': u'3300x75 ft.', 
u'07/25': u'8000x200 ft.', 
u'14/32': u'10000x200 ft.'} 

一個例子回調可能看起來像:

def parse_details(self, response): 
    for tr in response.xpath('.//tr[contains(., "runways")]/following-sibling::tr[@class]'): 
     yield {tr.xpath('string(.//td/a[contains(@title,"runway")])').get(): 
        tr.xpath('string(.//td[contains(.,"ft.")])').get()} 
+0

嗨..謝謝你的答案..你可以給我示例def parser與你的產量輸出上面的命令?我仍然是新的蟒蛇.. – Meganz