2012-11-15 61 views
2

我有一個「問題」與BeautifulSoup,更尤其是重模塊 這裏的問題是:BeautifulSoup提取

import re 

from bs4 import BeautifulSoup 

string = """ 
<div id="my_id"> 
    <ul> 
     <li>something</li> 
     <li class="color12">something</li> 
     <li class="color45">something else</li> 
    </ul> 
</div> 
""" 
soup = BeautifulSoup(string) 
li = soup.find_all('li', {'class': re.compile('color(\d+)')}) 
for ele in li: 
    print ele['class'] # will print colorXXXX but i would like to know how to get only this XXXX 

,但我想顏色之後,只提取數量。是否有可能還是我必須使用類似的義務:

match = re.search(r'color(\d+)', str(ele['class'])) 
if match: 
    print match.group(1) 

謝謝你的幫助:)

回答

2

你必須重新申請正則表達式。只要將其存儲在變量和重用:

colorpattern = re.compile(r'color(\d+)') 

li = soup.find_all('li', {'class': colorpattern}) 
for ele in li: 
    print colorpattern.search(ele['class']).group(1) 
+0

謝謝您的幫助=) – mosqui

+0

它不會在這種情況下,但在一般使用關係'r'''如果反斜線在正則表達式 – jfs

+0

@JFSebastian :我應該一直注意,當添加OP文本..時。 –