我試圖讓所有的URL與id='revSAR'
從下面的HTML標籤,使用Python的正則表達式:如何從此HTML標記中提取網址?
<a id='revSAR' href='http://www.amazon.com/Altec-Lansing-inMotion-Mobile-Speaker/product-reviews/B000EDKP8U/ref=cm_cr_dp_see_all_summary?ie=UTF8&showViewpoints=1&sortBy=byRankDescending' class='txtsmall noTextDecoration'>
See all 136 customer reviews
</a>
我嘗試下面的代碼,但它不工作(不打印輸出):
regex = b'<a id="revSAR" href="(.+?)" class="txtsmall noTextDecoration">(.+?)</a>'
pattern=re.compile(regex)
rev_url=re.findall(pattern,txt)
print ('reviews url: ' + str(rev_url))
解析'用美麗的湯了'鏈接的例子:https://groups.google.com/forum/?fromgroups#!topic/beautifulsoup/8TbctreqvSI – Paul
或者http://stackoverflow.com/questions/1080411/retrieve-links-from-web-page-using-python-and-beautiful-soup – Paul