如何從xpath獲取絕對網址？

我使用下面的代碼來獲得一個項目的URL：如何從xpath獲取絕對網址？

node.xpath('//td/a[starts-with(text(),"itunes")]')[0].attrib['href']

它給了我這樣的：

itunes20170107.tbz

不過，我希望得到完整的URL，這是：

https://feeds.itunes.apple.com/feeds/epf/v3/full/20170105/incremental/current/itunes20170109.tbz

有沒有一種簡單的方法可以從lxml中獲得完整的url，而無需自己構建它？

來源

2017-01-09 David542

lxml.html只會解析href，因爲它是HTML裏面。如果你想鏈接的絕對和相對不，你應該使用urljoin()：

from urllib.parse import urljoin # Python3 
# from urlparse import urljoin # Python2 

url = "https://feeds.itunes.apple.com/feeds/epf/v3/full/20170105/incremental/current" 

relative_url = node.xpath('//td/a[starts-with(text(),"itunes")]')[0].attrib['href'] 
absolute_url = urljoin(url, relative_url)

演示：

>>> from urllib.parse import urljoin # Python3 
>>> 
>>> url = "https://feeds.itunes.apple.com/feeds/epf/v3/full/20170105/incremental/current" 
>>> 
>>> relative_url = "itunes20170107.tbz" 
>>> absolute_url = urljoin(url, relative_url) 
>>> absolute_url 
'https://feeds.itunes.apple.com/feeds/epf/v3/full/20170105/incremental/itunes20170107.tbz'

來源

2017-01-09 20:54:43 alecxe

另一種方式來做到這一點：

import requests 
from lxml import fromstring 

url = 'http://server.com' 
response = reqests.get(url) 
etree = fromstring(response.text) 
etree.make_links_absolute(url)`

來源

2017-06-16 09:44:43

如何從xpath獲取絕對網址？

回答

相關問題