0
我需要從HTML表格中提取一個值,該表格可以從txt文件中的web服務器中獲取。確切的要求是將最後一次讀取的時間明智地提取到變量中。BeautifulSoup Python - HTML表格數據問題
這張表的格式並不完美,我認爲。
下面是表的HTML代碼的例子...
<table border="1" rules="all">
<col />
<col />
<col align="char" char="." />
<col align="char" char="." />
<col />
<col />
<col align="char" char="m" />
<col align="char" char="m" />
<col align="char" char="." />
<col align="char" char="," />
<tr>
<th colspan="2" rowspan="2">Date & time</th>
<th rowspan="2">Temp</th>
<th rowspan="2">Feels like</th>
<th rowspan="2">Humidity</th>
<th colspan="3">Wind</th>
<th rowspan="2">Rain</th>
<th rowspan="2">Pressure</th>
</tr>
<tr>
<th>dir</th>
<th>ave</th>
<th>gust</th>
</tr>
<tr>
<td>2014/01/08</td>
<td>1056 GMT</td>
<td>11.0 °C</td>
<td>9.8 °C</td>
<td>74%</td>
<td>NNW</td>
<td>1 mph</td>
<td>6 mph</td>
<td>0.3 mm</td>
<td>1032.4 hPa, rising</td>
</tr>
<tr>
<td></td>
<td>1159 GMT</td>
<td>10.8 °C</td>
<td>9.7 °C</td>
<td>74%</td>
<td>SSE</td>
<td>1 mph</td>
<td>4 mph</td>
<td>0.0 mm</td>
<td>1032.0 hPa, rising slowly</td>
</tr>
<tr>
<td></td>
<td>1258 GMT</td>
<td>11.0 °C</td>
<td>9.9 °C</td>
<td>73%</td>
<td>SSE</td>
<td>1 mph</td>
<td>4 mph</td>
<td>0.0 mm</td>
<td>1031.5 hPa, falling slowly</td>
</tr>
<tr>
<td></td>
<td>1357 GMT</td>
<td>10.8 °C</td>
<td>9.7 °C</td>
<td>75%</td>
<td>SSW</td>
<td>1 mph</td>
<td>4 mph</td>
<td>0.0 mm</td>
<td>1030.7 hPa, falling</td>
</tr>
<tr>
<td></td>
<td>1456 GMT</td>
<td>10.3 °C</td>
<td>9.3 °C</td>
<td>77%</td>
<td>ENE</td>
<td>1 mph</td>
<td>4 mph</td>
<td>0.0 mm</td>
<td>1030.0 hPa, falling</td>
</tr>
<tr>
<td></td>
<td>1600 GMT</td>
<td>9.7 °C</td>
<td>8.7 °C</td>
<td>81%</td>
<td>WNW</td>
<td>1 mph</td>
<td>3 mph</td>
<td>0.0 mm</td>
<td>1028.7 hPa, falling</td>
</tr>
<tr>
<td></td>
<td>1658 GMT</td>
<td>8.9 °C</td>
<td>7.9 °C</td>
<td>86%</td>
<td>NNE</td>
<td>1 mph</td>
<td>4 mph</td>
<td>0.0 mm</td>
<td>1026.9 hPa, falling quickly</td>
</tr>
</table>
我有以下Python代碼這使所有數據到行
#!/usr/bin/python
from BeautifulSoup import BeautifulSoup
import urllib2
data = "http://****************/weather_station/data/6hrs.txt"
req = urllib2.Request(data)
page = urllib2.urlopen(req)
soup = BeautifulSoup(page)
table = soup.find('table')
for row in table.findAll('tr'):
col = row.findAll('td')
# time = col[0].string
# temp = col[1].string
print col
這就是我卡住。 time = col [0] .string返回錯誤列表索引超出範圍,這意味着列表中沒有任何內容,但是如果我打印col,它會顯示我希望提取的數據。
有什麼建議嗎?
#下面的答案對該表非常有用。我也希望從這樣一個表中獲取同樣的數據...
<table border="1" rules="rows" cellspacing="0" cellpadding="5">
<col />
<col />
<col align="char" char="." />
<col align="char" char="." />
<col />
<col />
<col align="char" char="m" />
<col align="char" char="m" />
<col align="char" char="." />
<col align="char" char="," />
<tr>
<th rowspan="2">Time</th>
<th rowspan="2">Temp</th>
<th rowspan="2">Feels like</th>
<th rowspan="2">Humidity</th>
<th colspan="3">Wind</th>
<th rowspan="2">Rain</th>
<th rowspan="2">Pressure</th>
</tr>
<tr>
<th>dir</th>
<th>ave</th>
<th>gust</th>
</tr>
<tr>
<td>12:45 <small>GMT:</small></td>
<td>8.8<small>C</small></td>
<td>7.1 <small>°C</small></td>
<td>66<small>%</small></td>
<td>W </td>
<td>1 <small>mph</small></td>
<td>2 <small>mph</small></td>
<td>0.0 <small>mm</small></td>
<td>1022 <small>hPa</small></td>
</tr>
<tr>
<td>12:40 <small>GMT:</small></td>
<td>8.9<small>C</small></td>
<td>6.9 <small>°C</small></td>
<td>66<small>%</small></td>
<td>SE </td>
<td>2 <small>mph</small></td>
<td>4 <small>mph</small></td>
<td>0.0 <small>mm</small></td>
<td>1022 <small>hPa</small></td>
</tr>
<tr>
<td>12:34 <small>GMT:</small></td>
<td>8.8<small>C</small></td>
<td>6.3 <small>°C</small></td>
<td>66<small>%</small></td>
<td>NE </td>
<td>3 <small>mph</small></td>
<td>7 <small>mph</small></td>
<td>0.0 <small>mm</small></td>
<td>1022 <small>hPa</small></td>
</tr>
<tr>
<td>12:29 <small>GMT:</small></td>
<td>9.0<small>C</small></td>
<td>6.4 <small>°C</small></td>
<td>64<small>%</small></td>
<td>NW </td>
<td>3 <small>mph</small></td>
<td>6 <small>mph</small></td>
<td>0.0 <small>mm</small></td>
<td>1022 <small>hPa</small></td>
</tr>
<tr>
<td>12:24 <small>GMT:</small></td>
<td>9.6<small>C</small></td>
<td>7.4 <small>°C</small></td>
<td>63<small>%</small></td>
<td>S </td>
<td>2 <small>mph</small></td>
<td>5 <small>mph</small></td>
<td>0.0 <small>mm</small></td>
<td>1022 <small>hPa</small></td>
</tr>
<tr>
<td>12:19 <small>GMT:</small></td>
<td>10.1<small>C</small></td>
<td>7.4 <small>°C</small></td>
<td>61<small>%</small></td>
<td>SW </td>
<td>4 <small>mph</small></td>
<td>6 <small>mph</small></td>
<td>0.0 <small>mm</small></td>
<td>1022 <small>hPa</small></td>
</tr>
<tr>
<td>12:14 <small>GMT:</small></td>
<td>10.8<small>C</small></td>
<td>8.9 <small>°C</small></td>
<td>61<small>%</small></td>
<td>SE </td>
<td>2 <small>mph</small></td>
<td>2 <small>mph</small></td>
<td>0.0 <small>mm</small></td>
<td>1022 <small>hPa</small></td>
</tr>
<tr>
<td>12:09 <small>GMT:</small></td>
<td>10.7<small>C</small></td>
<td>8.8 <small>°C</small></td>
<td>61<small>%</small></td>
<td>N </td>
<td>2 <small>mph</small></td>
<td>3 <small>mph</small></td>
<td>0.0 <small>mm</small></td>
<td>1022 <small>hPa</small></td>
</tr>
<tr>
<td>12:04 <small>GMT:</small></td>
<td>10.3<small>C</small></td>
<td>8.5 <small>°C</small></td>
<td>64<small>%</small></td>
<td>NE </td>
<td>2 <small>mph</small></td>
<td>3 <small>mph</small></td>
<td>0.0 <small>mm</small></td>
<td>1022 <small>hPa</small></td>
</tr>
<tr>
<td>11:58 <small>GMT:</small></td>
<td>9.3<small>C</small></td>
<td>7.6 <small>°C</small></td>
<td>65<small>%</small></td>
<td>N </td>
<td>1 <small>mph</small></td>
<td>2 <small>mph</small></td>
<td>0.0 <small>mm</small></td>
<td>1022 <small>hPa</small></td>
</tr>
<tr>
<td>11:53 <small>GMT:</small></td>
<td>9.3<small>C</small></td>
<td>7.8 <small>°C</small></td>
<td>65<small>%</small></td>
<td>W </td>
<td>0 <small>mph</small></td>
<td>2 <small>mph</small></td>
<td>0.0 <small>mm</small></td>
<td>1022 <small>hPa</small></td>
</tr>
<tr>
<td>11:48 <small>GMT:</small></td>
<td>8.8<small>C</small></td>
<td>7.1 <small>°C</small></td>
<td>66<small>%</small></td>
<td>W </td>
<td>1 <small>mph</small></td>
<td>2 <small>mph</small></td>
<td>0.0 <small>mm</small></td>
<td>1021 <small>hPa</small></td>
</tr>
</table>
使用相同的代碼如下
table = soup.find('table')
for row in table.findAll('tr')[1:]:
col = row.findAll('td')
if len(col) >= 2:
time = col[0].string
temp = col[1].string
print time
print temp
時間&溫度返回「無」
如果我打印col所有的值都在那裏。爲什麼len(col)> = 2不適用於該數據?
十分感謝,對於上表中的作品,我已經添加了以下只顯示溫度而不顯示溫度。 a = re.sub(r'[^ 0-9 \ - \ d。]','',temp) 打印時間 打印 – user3176960