獲取鏈接的href

我正在使用lxml和python。我想獲取href的鏈接，該鏈接的內容爲更多評論（40）關於此page。我基本上是廢除這個網站，並希望得到評論。獲取鏈接的href

希望能得到幫助。 Thanx

2012-03-27 Zain Khan

該鏈接是使用客戶端JavaScript添加的。所以你不能使用普通的HTML解析來獲得href。但是你可以看一下JavaScript代碼，並獲得鏈接從那裏：

>>> import re 
>>> import urllib2 
>>> import lxml.html 
>>> page = urllib2.urlopen("http://maps.google.com/maps/place?cid=2860002122405830765").read() 

# have to search the page source since the link is added in javascript 
>>> mo = re.search(r'<div class="pp-more-reviews">.*?</div>', page) 
>>> div = lxml.html.fromstring(mo.group(0)) 
>>> href = div.find("a").attrib["href"]

其他選項是：

使用selenium控制一個真正的瀏覽器。
使用phantomJS模擬瀏覽器

來源

2012-03-27 08:22:33 codeape

Thanx的大力幫助！ lxml是要求：P – 2012-03-27 08:33:31

如果您可以幫我解決下一頁上的類似問題。我想獲取Y行中的* X行發現此評論有用*。這是每次審查。感謝名單 – 2012-03-27 09:21:49

我試着按以下方式做這件事。不是很優雅，但仍然解決了目的

response = urllib.urlopen('http://maps.google.com/maps/place?cid=7101561317478851901').read() 
dom = html.fromstring(response) 
href = dom.find_class('pp-more-reviews')[0].find_class('pp-more-content-link')[0].xpath('@href')

來源

2012-03-27 08:32:59

獲取鏈接的href

回答

相關問題