如何提取網頁的某些部分在Python

目標網頁： http://www.immi.gov.au/skilled/general-skilled-migration/estimated-allocation-times.htm 如何提取網頁的某些部分在Python

我想提取的部分：

<tr> 
    <td>Skilled &ndash; Independent (Residence) subclass 885<br />online</td> 
    <td>N/A</td> 
    <td>N/A</td> 
    <td>N/A</td> 
    <td>15 May 2011</td> 
    <td>N/A</td> 
    </tr>

一旦代碼通過搜索關鍵字「找到該部分子類885
在線「，那麼它應該打印第5個標籤中的日期，即」2011年5月15日「，如上所示。

這只是一個監視器，讓我自己留意移民申請的進度。

來源

2011-08-14 jiaoziren

查看BeautifulSoup –

「Beau--ootiful Soo--oop!

Beau--ootiful Soo--oop!

洙 - 電子的空中接力晚上，

Beautiful, beauti--FUL SOUP!「

--Lewis卡羅爾，Alice's Adventures in Wonderland

我想這正是他的想法！

素甲魚可能會做這樣的事：

>>> from BeautifulSoup import BeautifulSoup 
>>> import urllib2 
>>> url = 'http://www.immi.gov.au/skilled/general-skilled-migration/estimated-allocation-times.htm' 
>>> page = urllib2.urlopen(url) 
>>> soup = BeautifulSoup(page) 
>>> for row in soup.html.body.findAll('tr'): 
...  data = row.findAll('td') 
...  if data and 'subclass 885online' in data[0].text: 
...   print data[4].text 
... 
15 May 2011

但我不知道它會幫助，自該日起已經過去了！

祝你好運與應用程序！

來源

2011-08-14 06:16:51 Johnsyweb

感謝代碼，我總是想知道BSoup是什麼意思.... – jiaoziren

有一個名爲美麗的湯的圖書館，可以完成你要求的工作。 http://www.crummy.com/software/BeautifulSoup/

來源

2011-08-14 04:45:54

你可能想以此爲出發點：電子 - -

Python 2.6.7 (r267:88850, Jun 13 2011, 22:03:32) 
[GCC 4.6.1 20110608 (prerelease)] on linux2 
Type "help", "copyright", "credits" or "license" for more information. 
>>> import urllib2, re 
>>> from BeautifulSoup import BeautifulSoup 
>>> urllib2.urlopen('http://www.immi.gov.au/skilled/general-skilled-migration/estimated-allocation-times.htm') 
<addinfourl at 139158380 whose fp = <socket._fileobject object at 0x84aa2ac>> 
>>> html = _.read() 
>>> soup = BeautifulSoup(html) 
>>> soup.find(text = re.compile('\\bsubclass 885\\b')).parent.parent.find('td', text = re.compile(' [0-9]{4}$')) 
u'15 May 2011'

來源

2011-08-14 05:00:01

初學者在這裏，但從我讀過的，在lxml屏幕抓取的首選模塊。這只是一個偏好問題，還是要提供任何顯着的優勢？另外，你的比例是30比1。 – danem

嗨皮特，我沒有使用lxml，我得看看它。你說得對，我走了，但在正確的方向：^） –

啊，我以爲你想在另一個方向。提問有什麼不妥？ D： – danem

如何提取網頁的某些部分在Python

回答

相關問題