0
我從下面的代碼中提取網站的一些數據,但我遇到了這個行duration = tr.select('td.duration')[0].contents[0].strip()
持續時間的一些問題,它引發了下面的異常。請問我該如何修復該行,謝謝你的順序提取持續時間數據。我在SO上搜索過類似的問題,但他們並不完全回答我的問題。網絡抓取錯誤
# import needed libraries
from mechanize import Browser
from bs4 import BeautifulSoup
import csv
br = Browser()
# Ignore robots.txt
br.set_handle_robots(False)
br.addheaders = [('User-agent', 'Chrome')]
# Retrieve the home page
br.open('http://fahrplan.sbb.ch/bin/query.exe/en')
br.select_form(nr=6)
br.form["REQ0JourneyStopsS0G"] = 'Eisenstadt' # Origin train station (From)
br.form["REQ0JourneyStopsZ0G"] = 'sarajevo' # Destination train station (To)
br.form["REQ0JourneyTime"] = '5:30' # Search Time
br.form["date"] = '18.01.17' # Search Date
# Get the search results
br.submit()
# get the response from mechanize Browser
soup = BeautifulSoup(br.response().read(), 'lxml', from_encoding="utf-8")
trs = soup.select('table.hfs_overview tr')
# scrape the contents of the table to csv (This is not complete as I cannot write the duration column to the csv)
with open('out.csv', 'w') as f:
for tr in trs:
locations = tr.select('td.location')
if len(locations) > 0:
location = locations[0].contents[0].strip()
prefix = tr.select('td.prefix')[0].contents[0].strip()
time = tr.select('td.time')[0].contents[0].strip()
duration = tr.select('td.duration')[0].contents[0].strip()
f.write("{},{},{},{}\n".format(location.encode('utf-8'), prefix, time, duration))
Traceback (most recent call last):
File "C:/.../tester.py", line 204, in <module>
duration = tr.select('td.duration')[0].contents[0].strip()
IndexError: list index out of range
Process finished with exit code 1
你明白'IndexError:list index out of range'的含義嗎?這些錯誤非常明顯。 – Carcigenicate
從外觀上看,該網站沒有任何'td'元素,或者第一個'td'元素不包含任何東西。調試以找出它是哪個。 – Carcigenicate