這就是我所能夠管理的!我想獲得代理的如何從網站表格中的列中提取信息?
import urllib.request
page = urllib.request.urlopen("http://www.samair.ru/proxy/ip-address-01.htm")
page('\d+\.\d+\.\d+\.\d+')
這就是我所能夠管理的!我想獲得代理的如何從網站表格中的列中提取信息?
import urllib.request
page = urllib.request.urlopen("http://www.samair.ru/proxy/ip-address-01.htm")
page('\d+\.\d+\.\d+\.\d+')
在這種情況下,該表是不是一個真正的HTML表格,而不是純文本包裹在<pre></pre>
。您可以通過查看頁面源來驗證它。 。不管怎麼樣,BeautifulSoup它在公園裏散步:
In [1]: from bs4 import BeautifulSoup
In [2]: from urllib.request import urlopen
In [3]: bs = BeautifulSoup(urlopen('http://www.samair.ru/proxy/ip-address-01.htm'))
In [4]: print(bs.find('pre').text)
IP address Anonymity level Checked time Country
056.249.66.50:8080 transparent Apr-21, 10:33 Bulgaria
1.63.18.22:8080 transparent Apr-21, 05:56 China
1.9.75.8:8080 transparent Apr-21, 12:58 Malaysia
103.247.219.165:8080 transparent Apr-21, 04:01 Indonesia
103.4.165.190:80 transparent Apr-21, 11:34 Indonesia
103.9.126.110:8080 transparent Apr-21, 12:19 Indonesia
109.173.98.64:8080 transparent Apr-20, 22:39 Russian Federation
109.197.194.142:8080 transparent Apr-21, 12:07 Russian Federation
109.207.61.141:8090 transparent Apr-21, 11:14 Poland
109.207.61.145:8090 transparent Apr-21, 13:04 Poland
109.207.61.149:8090 transparent Apr-21, 10:21 Poland
109.207.61.165:8090 transparent Apr-21, 03:57 Poland
109.207.61.170:8090 transparent Apr-21, 11:02 Poland
109.207.61.208:8090 transparent Apr-21, 10:45 Poland
109.224.55.46:80 transparent Apr-20, 21:50 Iraq
109.227.124.105:8080 transparent Apr-21, 09:57 Ukraine
109.69.6.118:8080 transparent Apr-21, 11:44 Albania
110.138.248.135:8080 transparent Apr-21, 09:10 Indonesia
110.139.13.121:8080 transparent Apr-21, 11:31 Indonesia
110.159.179.108:80 transparent Apr-20, 20:35 Malaysia
In [5]: [l.split()[0] for l in bs.find('pre').text.split('\n')[1:]][1:]
Out[5]:
['056.249.66.50:8080',
'1.63.18.22:8080',
'1.9.75.8:8080',
'103.247.219.165:8080',
'103.4.165.190:80',
'103.9.126.110:8080',
'109.173.98.64:8080',
'109.197.194.142:8080',
'109.207.61.141:8090',
'109.207.61.145:8090',
'109.207.61.149:8090',
'109.207.61.165:8090',
'109.207.61.170:8090',
'109.207.61.208:8090',
'109.224.55.46:80',
'109.227.124.105:8080',
'109.69.6.118:8080',
'110.138.248.135:8080',
'110.139.13.121:8080',
'110.159.179.108:80']
回溯(最近通話最後一個): 文件「C :無模塊名爲 'BS4' – user1567728 2013-04-24 22:24:29
什麼** **版本您使用的是 – user1567728 2013-04-24 22:30:14
@ user1567728最新的一個:\ \特立\桌面\ ss.py」,1號線,在
http://docs.python.org/2/library/xml.dom.html – 2013-04-21 14:07:21