-1
我使用python3.5.1和BeautifulSoup 我想用正則表達式來搜索特定鏈路刮網站使用正則表達式的特定鏈接: 我的代碼:如何搜索在python
from bs4 import BeautifulSoup
import urllib.request
import re
r = urllib.request.urlopen('http://i.cantonfair.org.cn/en/expexhibitorlist.aspx?categoryno=404').read()
soup = BeautifulSoup(r,"html.parser")
links = soup.find_all("a", href=re.compile(r"ExpExhibitorList\.aspx\?categoryno=[0-9]+"))
linksfromcategories = ([link["href"] for link in links])
print(linksfromcategories)
我得到所有的類似鏈接
['/cn/ExpExhibitorList.aspx?categoryno=432', 'ExpExhibitorList.aspx?categoryno=432003']
但我不想
'/cn/ExpExhibitorList.aspx?categoryno=432'
待檢索
爲什麼你不想要那個鏈接?它匹配你的正則表達式,所以你會得到它。請解釋更多 –