蟒蛇enconding正則表達式的問題

我試圖從頁面出來得到這一行：蟒蛇enconding正則表達式的問題

          $ 55 326

我做了這個正則表達式來得到的數字：

player_info['salary'] = re.compile(r'\$ \d{0,3} \d{1,3}')

當我得到我使用的文字bs4和文本的類型是'unicode'

for a in soup_ntr.find_all('div', id='playerbox'): 
     player_box_text = a.get_text() 
     print(type(player_box_text))

我似乎無法得到結果。我也曾嘗試用正則表達式這樣的

player_info['salary'] = re.compile(ur'\$ \d{0,3} \d{1,3}') 
    player_info['salary'] = re.compile(ur'\$ \d{0,3} \d{1,3}', re.UNICODE)

但我不能找出獲取數據。我讀的頁面有此標題：

Content-Type: text/html; charset=utf-8

希望一些幫助來弄明白。

來源

2012-10-05 jantzen05

這是一個很好的網站來掌握正則表達式。 http://txt2re.com/

#!/usr/bin/python 
# URL that generated this code: 
# http://txt2re.com/index-python.php3?s=$%2055%20326&2&1 

import re 

txt='$ 55 326' 
re1='.*?' # Non-greedy match on filler 
re2='(\\d+)' # Integer Number 1 
re3='.*?' # Non-greedy match on filler 
re4='(\\d+)' # Integer Number 2 

rg = re.compile(re1+re2+re3+re4,re.IGNORECASE|re.DOTALL) 
m = rg.search(txt) 
if m: 
    int1=m.group(1) 
    int2=m.group(2) 
    print "("+int1+")"+"("+int2+")"+"\n"

來源

2012-10-05 22:02:40

我試圖表達，但我想我失敗UTF-8/Unicode的處理。如果我改變空格，我的表達式會查找數據。我真的不知道如何得到它。 – jantzen05

這工作正常，但它也捕獲一些其他的東西，如00 $ cphCon一個單詞中的$ cp。 – jantzen05

您可以根據需要使正則表達式更加複雜。如果您知道數據的輸入格式是可靠的，則正則表達式可以像只能始終可靠地提取字符串一樣簡單。所以在這裏你知道你只想要$符號和數字出現在字符串中。這是更可能的一點進一步的正則表達式。 –

re.compile不匹配任何東西。它只是創建一個正則表達式的編譯版本。

你想是這樣的：

matchObj = re.match(r'\$ (\d{0,3}) (\d{1,3})', player_box_text) 
player_info['salary'] = matchObj.group(1) + matchObj.group(2)

來源

2012-10-05 22:03:31 Cfreak

對不起，關於使用og編譯，我實際上使用re.search後來我使用編譯版本。我的麻煩是，我可以找到一些數據和其他數據失敗，因爲我不知道以正確的編碼獲取數據。 – jantzen05

看到你的觀點。其實我正在使用re.search。我首先創建表達式，然後用表達式調用re.search。 – jantzen05

蟒蛇enconding正則表達式的問題

回答

相關問題