Python BeautifulSoup提取標題網頁爬蟲

我想從圖像中提取標題。我設法提取了url，但不知道如何編碼提取圖像的標題。Python BeautifulSoup提取標題網頁爬蟲

import requests 
from bs4 import BeautifulSoup 

def trade_spider(max_pages): 
    page = 1 
    while page <= max_pages: 
     url = 'http://www.gurstree.com.au/s—cars—vans—utes/melbourne/page—' + str(page) + '/c1832013001317' 
     source_code = requests.get(url) 
     plain_text = source_code.text 
     soup = BeautifulSoup(plain_text) 
     for link in soup.findAll('a', {'class': 'ad—listing_title—link'}): 
      href = 'http://www.gumtree.com.au/' + link.get('href') 
      print(href) 
     page += 1 

trade_spider(1)

The HTML is:

<a itemprop="url" class="ad-listing__thumb-link" name="1124692138" href="/s-ad/derrimut/cars-vans-utes/2015-toyota-86-coupe-12-month-warranty-/1124692138" data-ref="searchTopAd"> 
    <span id="r-image-TOP_AD-1124692138" title="2015 Toyota 86 Coupe **12 MONTH WARRANTY** Derrimut Brimbank Area Preview" class="j-responsive-image ad-listing__thumb" data-index="1">...</span> 
</a>

第一行是href，但我想要得到的title按照HTML的span塊突出。

謝謝！

來源

2017-01-24 Chris

發佈您的代碼，而不是像 –

ü可以在這裏添加網址是什麼？很難從代碼圖片 –

link.span.get('title')

使用.找到下一個span並獲得title

使用regex在addribute匹配字符串：

import re  
soup.find('span', id=re.compile(r'r-image'))

來源

2017-01-24 09:36:35

好吧，我設法讓它與link.get（'title'）一起工作。如果我想使用'id'引用和'r-image-TOP_AD-1124692138'，如果每個帖子的-Top_AD-末尾的數字都改變了，我怎麼能使用它？ – Chris

真棒謝謝你！ – Chris

Python BeautifulSoup提取標題網頁爬蟲

回答

相關問題