從網頁上刮 - python

我對使用python進行網頁編程非常陌生。目前，我正在努力從網站上「刮」一小部分信息。網站：http://www.airport-data.com/airport/HJO/#location 信息提取/廢鋼：「海拔」（下位置見&自我簡介）從網頁上刮 - python

的代碼，我到目前爲止有：

from BeautifulSoup import BeautifulSoup 
url2 = urllib2.urlopen('http://www.airport-data.com/airport/HJO/#location').read() 
soup = BeautifulSoup(url2) 
print soup #I did this just to see the content.

我試着在網上閱讀，看着以前的一些帖子但沒有把我的頭包裹起來。有關如何繼續從網絡鏈接中提取/提取「高程」的建議？謝謝

來源

2014-09-05 Nikhil Gupta

首先，根據BeautifulSoup project documentation：

美麗的湯3已經被美麗的湯4

美麗的湯3只適用於Python的2.x的更換，但美麗湯4也在Python 3.x上工作。美麗的湯4更快，有更多的功能，和第三方解析器，如lxml和html5lib。你應該使用美麗的湯4所有新項目。

安裝BeautifulSoup 4-th version：

pip install beautifulSoup4

然後，想法是找到包含Elevation:文本標籤，並得到the next sibling：

import urllib2 
from bs4 import BeautifulSoup 

url2 = urllib2.urlopen('http://www.airport-data.com/airport/HJO/#location') 
soup = BeautifulSoup(url2) 

print soup.find('td', class_='tc1', text='Elevation:').next_sibling.text

打印：

240 ft/73.15 m (Estimated)

來源

2014-09-05 19:02:49 alecxe

謝謝爲ans WER。我做了'soup.find（'td'，class _ ='tc0'，text ='Longitude/Latitude：'）。next_sibling.text'，它正在提取內容。但是''經度/緯度：'有兩條線'
'分開，我如何提取第二條線，即'
'後面的內容？（我可以通過字符串操作獲得第二行，但是想知道我是否可以在沒有字符串操作的情況下提取） – 2014-09-05 19:20:24

@SrinGupta當然，'print soup.find（'td'，text ='Longitude/Latitude：'）。next_sibling.contents [2 ]'應該這樣做。 – alecxe 2014-09-05 19:29:59

太棒了！謝謝 – 2014-09-05 20:22:04

從網頁上刮 - python

回答

相關問題