Python請求和beautifulsoup4，只收集「href」鏈接

from bs4 import BeautifulSoup 
import requests 

url = "https://www.brightscope.com/ratings" 
headers = {'User-Agent':'Mozilla/5.0'} 
page = requests.get(url) 
soup = BeautifulSoup(page.text, "html.parser") 

data = soup.find_all('li',{"class":"more-data"})+soup.findAll('li', {"class":"more-data topten"}) 
for item in data: 
    print(item('a'))

我想只打印hrefs，但我似乎無法弄清楚這一點。我看過不同的視頻，無法得到它。我究竟做錯了什麼？我知道上面的代碼是打印「a」標籤的內容，但我只需要href的。Python請求和beautifulsoup4，只收集「href」鏈接

來源

2016-12-19 Kamikaze_goldfish

你需要的是使用類似字典的訪問元素的屬性：

[a['href'] for a in item('a')]

而且，作爲一個側面說明，你可以提高你定位你的li元素的方式，而不是：

data = soup.find_all('li',{"class":"more-data"})+soup.findAll('li', {"class":"more-data topten"}) 
for item in data: 
    print(item('a'))

你可以這樣做：

links = soup.select("li.more-data a") 
for a in links: 
    print(a["href"])

其中li.more-data a是CSS selector，它將匹配li元素中的所有a元素與more-data類。

來源

2016-12-19 04:47:14 alecxe

Python請求和beautifulsoup4，只收集「href」鏈接

回答

相關問題