0
我正在使用python抓取一個網站http://i.cantonfair.org.cn/en/expexhibitorlist.aspx?categoryno=411。 我希望得到禮物div標籤裏的鏈接,其中有兩個標籤,如:如何獲得在Python中div標籤中存在的標籤?
<div id="main_category">
<div class="tit1"><a href="#" onclick="ExpandStage(1);"><strong>Phase 1</strong><br />April 15 - 19</a></div>
<ul id="phase1">
<li><a href="expexhibitorlist.aspx?categoryno=411">Consumer Electronics and Information Products</a></li>
<li><a href="expexhibitorlist.aspx?categoryno=412">Electronic and Electrical Products</a></li>
,我只希望所有的標籤就像
<a href="expexhibitorlist.aspx?categoryno=411">Consumer Electronics and Information Products</a>
。還有我如何使用正則表達式找到那些網址?
我想這樣
from bs4 import BeautifulSoup
import re
import urllib.request
r = urllib.request.urlopen('http://i.cantonfair.org.cn/en/expexhibitorlist.aspx?categoryno=410').read()
soup = BeautifulSoup(r, "html.parser")
letters = soup.find_all("div",{"id":"main_category"})
for element in letters:
categories = element.a.get_text()
print (categories)