我想通過python中的BeautifulSoup
庫獲取它的HTML後提取鏈接的標題。 基本上,整個標題標籤使用BeautifulSoup從標題標籤中提取數據?
<title>Imaan Z Hazir on Twitter: "Guantanamo and Abu Ghraib, financial and military support to dictators in Latin America during the cold war. REALLY, AMERICA? (3)"</title>
我想提取的數據是在& QUOT標籤,這只是這個Guantanamo and Abu Ghraib, financial and military support to dictators in Latin America during the cold war. REALLY, AMERICA? (3)
我嘗試作爲
import urllib
import urllib.request
from bs4 import BeautifulSoup
link = "https://twitter.com/ImaanZHazir/status/778560899061780481"
try:
List=list()
r = urllib.request.Request(link, headers={'User-Agent': 'Chrome/51.0.2704.103'})
h = urllib.request.urlopen(r).read()
data = BeautifulSoup(h,"html.parser")
for i in data.find_all("title"):
List.append(i.text)
print(List[0])
except urllib.error.HTTPError as err:
pass
我也嘗試作爲
for i in data.find_all("title.""):
for i in data.find_all("title>""):
for i in data.find_all("""):
and
for i in data.find_all("quot"):
但是沒有人在工作。
我期望BeautifulSoup將'"'轉換成''',所以你只需要尋找'''' – zvone
@zvone這是什麼? ''''你的意思是這個''標題<">「'? – Amar