用Python和美麗的湯刮網

我正在練習建設web刮板。我現在正在開展的一項工作涉及到一個網站，爲該網站上的各個城市刮取鏈接，然後爲每個城市提供所有鏈接，並在所述鏈接中抓取所有鏈接。用Python和美麗的湯刮網

我用下面的代碼：

import requests 

from bs4 import BeautifulSoup 

main_url = "http://www.chapter-living.com/" 

# Getting individual cities url 
re = requests.get(main_url) 
soup = BeautifulSoup(re.text, "html.parser") 
city_tags = soup.find_all('a', class_="nav-title") # Bottom page not loaded dynamycally 
cities_links = [main_url + tag["href"] for tag in city_tags.find_all("a")] # Links to cities

如果我打印出來city_tags我得到我想要的HTML。但是，當我打印cities_links時，我得到AttributeError: 'ResultSet' object has no attribute 'find_all'。

我從其他q的收集在這裏，發生此錯誤，因爲city_tags返回無，但如果它打印出所需的html不能這樣的情況？我已經注意到，說html是[] - 這是否有所作爲？

來源

2017-03-16 Maverick

正如錯誤所說，city_tags爲ResultSet是節點列表，它並沒有find_all方法，您既可以通過設置有循環和每個節點上或在申請find_all您情況下，我想你可以簡單地從每個節點提取href屬性：

[tag['href'] for tag in city_tags] 

#['https://www.chapter-living.com/blog/', 
# 'https://www.chapter-living.com/testimonials/', 
# 'https://www.chapter-living.com/events/']

來源

2017-03-16 17:50:47 Psidom

好city_tags是標記的bs4.element.ResultSet（本質上是一個列表），你就可以調用find_all。您可能想要在結果集的每個元素中調用find_all，或者在此特定情況下只檢索它們的href屬性

import requests 
from bs4 import BeautifulSoup 

main_url = "http://www.chapter-living.com/" 

# Getting individual cities url 
re = requests.get(main_url) 
soup = BeautifulSoup(re.text, "html.parser") 
city_tags = soup.find_all('a', class_="nav-title") # Bottom page not loaded dynamycally 
cities_links = [main_url + tag["href"] for tag in city_tags] # Links to cities

來源

2017-03-16 17:50:50

用Python和美麗的湯刮網

回答

相關問題