2017-09-04 58 views

回答

1

您可以使用requests來獲取HTML,然後使用BeautifulSoup來解析它。以下內容會在HTML開頭的歌詞開始前查找HTML註釋,然後找到包含它的父項<div>。從該文本可以提取:

import requests 
from bs4 import BeautifulSoup, Comment 

r = requests.get("https://www.azlyrics.com/lyrics/runthejewels/closeyoureyesandcounttofuck.html", headers={'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.75 Safari/537.36'}) 
soup = BeautifulSoup(r.content, "html.parser") 

for comment in soup.find_all(string=lambda text:isinstance(text, Comment)): 
    if "Usage of azlyrics.com content" in comment: 
     print comment.parent.text 

這將會給你的東西出發:

[Zack De La Rocha:] 
Run them jewels fast, run them, run them jewels fast 
... 

如果需要如下這些庫可以安裝:

pip install beautifulsoup4 
pip install requests 
+0

非常感謝.... :) –