2017-02-16 14 views
0

我使用Beautifulsoup檢索從博客的藝術家的名字,因爲音樂標籤特定的匹配:BeautifulSoup - 返回頭部通訊員匹配頁腳

import requests 
from bs4 import BeautifulSoup 

r = requests.get('http://musicblog.kms-saulgau.de/tag/chillout/') 
html = r.content 

soup = BeautifulSoup(html, 'html.parser') 

藝術家名稱都存儲在這裏:

header = soup.find_all('header', class_= "entry-header") 

和藝術家標籤在這裏:

span = soup.find_all('span', class_= "tags-links") 

我可以得到所有的標題:

for each in header: 
    if each.find("a"): 
     each = each.find("a").get_text() 
     print each 

然後我擡頭爲 '另類',並在同一個頁腳 'CHILLOUT':

for each in span: 
    if each.find("a"): 
     tags = each.find("a")["href"] 
     if "alternative" in tags:  
      print each.get_text() 

代碼,到目前爲止,打印:

Terra Nine – The Heart of the Matter 
Emmit Fenn – Blinded 
Amparo – The Orchid Glacier 
Alpha Minus – Satellites 
Carbonates on Mars – The Song of Sol 
Josey Marina – Ocean Sighs 
Sunday – Only 
Some Kind Of Illness – The Light 
Vesna Kazensky – Raven 
James Lowe – Shallow 

Tags Alternative, Chillout, Indie Rock, New tracks 

但我想要做的只是返回匹配頁腳的對應條目,如下所示:

Some Kind Of Illness – The Light 
Alternative, Chillout, Indie Rock, New tracks 

我該如何實現這一目標?

回答

0
for article in soup.find_all('article'): 
    if article.select('a[href*="alternative"]') and article.select('a[href*="chillout"]'): 
     print(article.h2.text) 
     print(article.find(class_='tags-links').text) 

出來:

Some Kind Of Illness – The Light 
Tags Alternative, Chillout, Indie Rock, New tracks