IM試圖讓使用正則表達式蟒蛇出來一個網頁

import urllib.request 
import re 
page = urllib.request.urlopen("http://www.samair.ru/proxy/ip-address-01.htm").read() 
re.findall('\d+\.\d+\.\d+\.\d+', page)

我不的代理明白爲什麼它說：IM試圖讓使用正則表達式蟒蛇出來一個網頁

文件「C：\ Python33 \ LIB \ re.py」，線201，中的findall 返回_compile（圖案，標誌）.findall（串）類型錯誤：一類字節對象

來源

2013-04-27 Teli Kaufman

您是否嘗試在字符串中添加「u」？ – 2013-04-27 22:12:07

這可以幫助http://nedbatchelder.com/text/unipain.html – 2013-04-27 22:14:55

import urllib 
import re 
page = urllib.urlopen("http://www.samair.ru/proxy/ip-address-01.htm").read() 
print re.findall('\d+\.\d+\.\d+\.\d+', page)

他的工作是不能使用的字符串模式，給我的結果：

['056.249.66.50', '100.44.124.8', '103.31.250.115', ...

編輯

這適用於python2.7

來源

2013-04-27 22:14:51

他使用的是Python 3. – Cairnarvon 2013-04-27 22:15:52

所以，一天來了。我需要切換爲Py3 .. – 2013-04-27 22:23:14

哦，我有蟒蛇3.3.0 – 2013-04-27 22:31:34

讀取由urllib.request.urlopen返回的類文件對象的結果是一個字節對象。你可以將其解碼成unicode字符串和使用Unicode正則表達式：

>>> re.findall('\d+\.\d+\.\d+\.\d+', page.decode('utf-8')) 
['056.249.66.50', '100.44.124.8', '103.31.250.115', '105.236.180.243', '105.236.21.213', '108.171.162.172', '109.207.61.143', '109.207.61.197', '109.207.61.202', '109.226.199.129', '109.232.112.109', '109.236.220.98', '110.196.42.33', '110.74.197.141', '110.77.183.64', '110.77.199.111', '110.77.200.248', '110.77.219.154', '110.77.219.2', '110.77.221.208']

...或使用正則表達式的字節：

>>> re.findall(b'\d+\.\d+\.\d+\.\d+', page) 
[b'056.249.66.50', b'100.44.124.8', b'103.31.250.115', b'105.236.180.243', b'105.236.21.213', b'108.171.162.172', b'109.207.61.143', b'109.207.61.197', b'109.207.61.202', b'109.226.199.129', b'109.232.112.109', b'109.236.220.98', b'110.196.42.33', b'110.74.197.141', b'110.77.183.64', b'110.77.199.111', b'110.77.200.248', b'110.77.219.154', b'110.77.219.2', b'110.77.221.208']

根據其數據類型，你喜歡的工作。

來源

2013-04-27 22:22:24 Cairnarvon

IM試圖讓使用正則表達式蟒蛇出來一個網頁

回答

相關問題