python beautifulsoup從中國網站獲取頁腳

我試圖從中文網站獲取數據。我已經找到它在html中的位置，但需要幫助拔出文本。我至今：python beautifulsoup從中國網站獲取頁腳

from bs4 import BeautifulSoup 
import requests 

page = 'http://sbj.speiyou.com/search/index/subject:/grade:12/gtype:time' 
r = requests.get(page) 

r.encoding = 'utf-8' 
soup = BeautifulSoup(r.text) 

div = soup.find('div', class_='pagination mtop40')

我在找的數據是1/16的16。

來源

2014-04-03 jason

在div.text上使用正則表達式是一種選擇。以下正則表達式查找任何數字，後跟正斜槓，後跟更多數字。

import re 
pattern = re.compile(r'\d+\/\d+') 
matches = re.search(pattern, div.text) 
num = matches.group(0) # num = 1/16 here 
print num.split('/')[1]

import re 
pattern = re.compile(r'\d+\/(\d+)') # Group the needed data in the regex pattern 
matches = re.search(pattern, div.text) 
print matches.group(0)

來源

2014-04-03 15:39:44 shaktimaan

我如何獲取文本呢？ 'div.text'給我一個錯誤。 'AttributeError：'NoneType'對象沒有'text'屬性 ' – jason

@jason_cant_code錯誤提示變量'div'爲None。你的代碼是否仍然有'div = soup.find'（'div'，class _ ='pagination mtop40'）'？ – shaktimaan

我想通了。這是一個我沒有安裝的圖書館。 – jason

python beautifulsoup從中國網站獲取頁腳

回答

相關問題