2017-02-16 65 views
0

在下面的代碼中,我填寫表單然後在網站上提交。然後我刮取結果數據,然後寫入一個csv文件(所有這些工作都很好)。但有一個結果頁面上的文字'稍後'請如何點擊此鏈接。我在用。我檢查了一個類似的問題:this但它不完全回答我的問題。點擊提交表單後網頁上的鏈接

# import needed libraries 
from mechanize import Browser 
from datetime import datetime 
from bs4 import BeautifulSoup 
import csv 

br = Browser() 

# Ignore robots.txt 
br.set_handle_robots(False) 

# Google demands a user-agent that isn't a robot 
br.addheaders = [('User-agent', 'Chrome')] 

# Retrieve the Google home page, saving the response 
br.open('http://fahrplan.sbb.ch/bin/query.exe/en') 


# Enter the text input (This section should be automated to read multiple text input as shown in the question) 
br.select_form(nr=6) 

br.form["REQ0JourneyStopsS0G"] = 'Eisenstadt' # Origin train station (From) 
br.form["REQ0JourneyStopsZ0G"] ='sarajevo' # Destination train station (To) 
br.form["REQ0JourneyTime"] = '5:30' # Search Time 
br.form["date"] = '18.01.17' # Search Date 

# Get the search results 
br.submit() 

# get the response from mechanize Browser 
soup = BeautifulSoup(br.response().read(), 'html.parser', from_encoding="utf-8") 
trs = soup.select('table.hfs_overview tr') 

# scrape the contents of the table to csv (This is not complete as I cannot write the duration column to the csv) 
with open('out.csv', 'w') as f: 
    for tr in trs: 

     locations = tr.select('td.location') 
     if len(locations) > 0: 
      location = locations[0].contents[0].strip() 
      prefix = tr.select('td.prefix')[0].contents[0].strip() 
      time = tr.select('td.time')[0].contents[0].strip() 
      #print tr.select('td.duration').contents[0].strip() 
      durations = tr.select('td.duration') 
      #print durations 
      if len(durations) == 0: 
       duration = '' 
       #print("oops! There aren't any durations.") 
      else: 
       duration = durations[0].contents[0].strip() 
      f.write("{},{},{}, {}\n".format(location.encode('utf-8'), prefix, time, duration)) 

回答

1

Later鏈接的HTML看起來像

<a accesskey="l" class="hafas-browse-link" href="http://fahrplan.sbb.ch/bin/query.exe/en?ld=std2.a&amp;seqnr=1&amp;ident=kv.047469247.1487285405&amp;REQ0HafasScrollDir=1" id="hfs_linkLater" title="Search for later connections">Later</a> 

您可以用找到的網址:

In [22]: soup.find('a', text='Later')['href'] 
Out[22]: u'http://fahrplan.sbb.ch/bin/query.exe/en?ld=std2.a&seqnr=1&ident=kv.047469247.1487285405&REQ0HafasScrollDir=1' 

爲了使瀏覽器進入該鏈接調用br.open

In [21]: br.open(soup.find('a', text='Later')['href']) 
Out[21]: <response_seek_wrapper at 0x7f346a5da320 whose wrapped object = <closeable_response at 0x7f3469bee830 whose fp = <socket._fileobject object at 0x7f34697f26d0>>>