2017-05-19 59 views
0

我想查找img src =「([^」] +)「的每個實例,該實例前面是div class =」grid「,後面是div class =」orderplacebut「,在一些HTML代碼中,即我想在網格中找到div容器中的所有圖像Python - 如何使用finditer regex?

如果我使用findall,它只會返回一個圖像,因爲div class =「grid」只會在網頁上出現一次,因此它只會返回以下內容之一所以我想迭代findall正則表達式,以便它再次運行,並返回圖像URL的第二個實例,然後返回第三個等等。這可能使用finditer,我將如何在代碼中使用它?

下面的代碼是我findall正則表達式,只返回一個U RL。

from urllib import urlopen 
from re import findall 
import re 

dennisov_url = 'https://denissov.ru/en/' 
dennisov_html = urlopen(dennisov_url).read() 

# Print all images between div class="grid" and div class="orderplacebut" 
# Because the regex spans over several lines, use DOTALL flag to include 
# every character between, including new lines 

watch_image_urls = findall('<div class="grid".*<img src="([^"]+)".*<div class="orderplacebut"', dennisov_html, flags=re.DOTALL) 
print watch_image_urls 

回答

0

真的,使用另一種方法與分析器(未測試由於.ru結構域,其在這裏阻止):

import requests 
from bs4 import BeautifulSoup 

dennisov_url = 'https://denissov.ru/en/' 
dennisov_html = requests.get(dennisov_url) 
soup = BeautifulSoup(dennisov_html.text, 'lxml') 

images = soup.select('div.grid > img')