2012-09-11 48 views
1

我最近發現了python庫機械化,我想用它來獲取來自谷歌搜索的鏈接,但無法理解輸出。這裏是我的代碼片段:使用python mechanize與谷歌搜索的問題

import mechanize, cookielib 
br = mechanize.Browser() 
br.addheaders = [('User-agent', 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.1) Gecko/2008071615 Fedora/3.0.1-1.fc9 Firefox/3.0.1')] 
br.set_handle_robots(False) 
url = 'https://www.google.com/search?num=10&hl=en&site=&q=dog&oq=dog&aq=f&aqi=g10&aql=1&gs_sm=e' 

response = br.open(url) 

links = [link for link in br.links()] 

它運行正常,但輸出看起來是這樣的:

[ 
Link(base_url='https://www.google.com/search?num=10&hl=en&site=&q=dog&oq=dog&aq=f&aqi=g10&aql=1&gs_sm=e', url='/support/websearch/bin/answer.py?answer=186645&form=bb&hl=en', text='Learn more', tag='a', attrs=[('href', '/support/websearch/bin/answer.py?answer=186645&form=bb&hl=en')]), 
Link(base_url='https://www.google.com/search?num=10&hl=en&site=&q=dog&oq=dog&aq=f&aqi=g10&aql=1&gs_sm=e', url='http://www.google.com/intl/en/options/', text='More', tag='a', attrs=[('class', 'gbgt'), ('id', 'gbztm'), ('href', 'http://www.google.com/intl/en/options/'), ('onclick', 'gbar.tg(event,this)'), ('aria-haspopup', 'true'), ('aria-owns', 'gbd')]), 
Link(base_url='https://www.google.com/search?num=10&hl=en&site=&q=dog&oq=dog&aq=f&aqi=g10&aql=1&gs_sm=e', url='/webhp?hl=en&tab=ww', text='', tag='a', attrs=[('href', '/webhp?hl=en&tab=ww'), ('onclick', 'gbar.logger.il(39)'), ('title', 'Go to Google Home')]), 
..., 
] 

我怎麼得到實際的URL,而不是這種「點擊我」式的迴應?

謝謝!

回答

2

您正在拉動頁面上的每個鏈接,您需要將其過濾到相關的搜索結果鏈接。我認爲這會做你想要什麼:

links = [link for link in br.links() if any(attr==('class','l') for attr in link.attrs)] 

主要的搜索結果鏈接都顯示有class=l作爲屬性。我不熟悉mechanize以瞭解您是否可以在links()調用中執行此操作。