我有一個HTML文件中像下面:beautifulsoup找不到存在的href文件
<form action="/2811457/follow?gsid=3_5bce9b871484d3af90c89f37" method="post">
<div>
<a href="/2811457/follow?page=2&gsid=3_5bce9b871484d3af90c89f37">next_page</a>
<input name="mp" type="hidden" value="3" />
<input type="text" name="page" size="2" style='-wap-input-format: "*N"' />
<input type="submit" value="jump" /> 1/3
</div>
</form>
如何從文件中提取的「1/3」?
它是html的一部分,我打算說清楚。 當我使用beautifulsoup,
我是新來的beautifulsoup,我看了文檔,但仍然困惑。
如何從html文件中提取「1/3」?
total_urls_num = re.findall('\d+/\d+',response)
工作代碼:
from BeautifulSoup import BeautifulSoup
import re
with open("html.txt","r") as f:
response = f.read()
print response
soup = BeautifulSoup(response)
delete_urls = soup.findAll('a', href=re.compile('follow\?page')) #works,should escape ?
print delete_urls
#total_urls_num = re.findall('\d+/\d+',response)
total_urls_num = soup.find('input',type='submit')
print total_urls_num
位:(。* \ d/\ d *)'\ D'不'/ D' – JBernardo
但是當我改變,仍然沒有按't work,it return None – young001
'soup.find('input',value ='jump).next'怎麼樣? –