如何使用python從URL中提取元描述？

查看源代碼：http://www.virginaustralia.com/au/en/bookings/flights/make-a-booking/

與源代碼下面的代碼片段：

<title>Book a Virgin Australia Flight | Virgin Australia 
</title> 
    <meta name="keywords" content="" /> 
     <meta name="description" content="Search for and book Virgin Australia and partner flights to Australian and international destinations." />

我想要的標題和meta內容。

我用鵝，但它沒有做好提取。這裏是我的代碼：

website_title = [g.extract(url).title for url in clean_url_data]

和

website_meta_description=[g.extract(urlw).meta_description for urlw in clean_url_data]

結果是空

來源

2016-06-24 Technologic27

BeautifulSoup怎麼樣？ - https://www.crummy.com/software/BeautifulSoup/ –

請檢查BeautifulSoup作爲解決方案。

對於上面的問題，你可以使用下面的代碼來提取「說明」信息：

import requests 
from bs4 import BeautifulSoup 

url = 'http://www.virginaustralia.com/au/en/bookings/flights/make-a-booking/' 
response = requests.get(url) 
soup = BeautifulSoup(response.text) 

metas = soup.find_all('meta') 

print [ meta.attrs['content'] for meta in metas if 'name' in meta.attrs and meta.attrs['name'] == 'description' ]

輸出：

['Search for and book Virgin Australia and partner flights to Australian and international destinations.']

來源

2016-06-24 10:17:36 linpingta

你知道HTML的XPath？使用lxml lib與xpath來提取html元素是一種快速的方法。

import lxml 

doc = lxml.html.document_fromstring(html_content) 
title_element = doc.xpath("//title") 
website_title = title_element[0].text_content().strip() 
meta_description_element = doc.xpath("//meta[@property='description']") 
website_meta_description = meta_description_element[0].text_content().strip()

來源

2016-06-24 10:29:23 shangliuyan

如何使用python從URL中提取元描述？

回答

相關問題