如何在Python中使用美麗的湯來刮這個？

<a href="http://www.chrisstucchio.com/blog/2012/bandit_algorithms_vs_ab.html">Why Multi-armed Bandit algorithms are superior to A/B testing (with Math)</a>, <a href="user?id=yummyfajitas">yummyfajitas</a>, <a href="item?id=4060658">11 comments</a>,

如何通過有上面寫HTML作爲內容的HTML頁面湊和獲取數據出來是這樣的：如何在Python中使用美麗的湯來刮這個？

link = http://www.chrisstucchio.com/blog/2012/bandit_algorithms_vs_ab.html 
text = Why Multi-armed Bandit algorithms are superior to A/B testing (with Math) 
user_id = yummyfajitas 
item_id = 4060658

來源

2012-06-03 Hick

如果每次都在這以相同的順序：

html = r'<a href="http://www.chrisstucchio.com/blog/2012/bandit_algorithms_vs_ab.html">Why Multi-armed Bandit algorithms are superior to A/B testing (with Math)</a>, <a href="user?id=yummyfajitas">yummyfajitas</a>, <a href="item?id=4060658">11 comments</a>, ' 

from BeautifulSoup import BeautifulSoup 

soup = BeautifulSoup(html) #sort the html 
bowl = soup.findAll('a') #find all links in the html 

link = bowl[0]['href'] #find the first 'a' tags href 
text = bowl[0].contents[0] #find the first tags url 
user_id = bowl[1]['href'].split('?id=')[1] #split on '?id=' and take the second value. could be [-1] too 
item_id = bowl[2]['href'].split('?id=')[1] 

print 'link:', link 
print 'text:', text 
print 'user_id:', user_id 
print 'item_id:', item_id

來源

2012-06-03 16:27:36 TankorSmash

如何在Python中使用美麗的湯來刮這個？

回答

相關問題