2009-07-04 107 views
1

我使用的巴士公司運行一個糟糕的網站(Hebrew,English),它使一個簡單的「從今天的A到B時間表」查詢惡夢。我懷疑他們正試圖鼓勵使用昂貴的SMS查詢系統。使用python自動按下「提交」按鈕

我試圖從網站收穫整個時間表,通過提交查詢每個可能的點到每個可能的點,這將總計約10K查詢。查詢結果出現在一個彈出窗口中。我對網絡編程頗爲陌生,但熟悉python的基本方面。

  1. 什麼是最優雅的方式來解析頁面,從下拉菜單中選擇一個值,並按下「提交」使用腳本?
  2. 如何讓程序將新彈出窗口的內容作爲輸入?

謝謝!

回答

10

Twill是一種簡單的Web瀏覽腳本語言。它碰巧運動一個python api

twill is essentially a thin shell around the mechanize package. All twill commands are implemented in the commands.py file, and pyparsing does the work of parsing the input and converting it into Python commands (see parse.py). Interactive shell work and readline support is implemented via the cmd module (from the standard Python library).

「壓」 的一個例子從上面的鏈接文檔提交:

from twill.commands import go, showforms, formclear, fv, submit 

go('http://issola.caltech.edu/~t/qwsgi/qwsgi-demo.cgi/') 
go('./widgets') 
showforms() 

formclear('1') 
fv("1", "name", "test") 
fv("1", "password", "testpass") 
fv("1", "confirm", "yes") 
showforms() 

submit('0') 
+0

由於錯誤:我需要使用submit()not submit('0'):HiddenControl實例沒有屬性'_click'。請參閱:lists.idyll.org/pipermail/twill/2006-August/000526.html – user391339 2014-09-11 07:59:58

10

我會建議你使用mechanize。下面是從他們的網頁的代碼片段展示瞭如何提交一個表單:


import re 
from mechanize import Browser 

br = Browser() 
br.open("http://www.example.com/") 
# follow second link with element text matching regular expression 
response1 = br.follow_link(text_regex=r"cheese\s*shop", nr=1) 
assert br.viewing_html() 
print br.title() 
print response1.geturl() 
print response1.info() # headers 
print response1.read() # body 
response1.close() # (shown for clarity; in fact Browser does this for you) 

br.select_form(name="order") 
# Browser passes through unknown attributes (including methods) 
# to the selected HTMLForm (from ClientForm). 
br["cheeses"] = ["mozzarella", "caerphilly"] # (the method here is __setitem__) 
response2 = br.submit() # submit current form 

# print currently selected form (don't call .submit() on this, use br.submit()) 
print br.form 

7

你很少想實際「按提交按鈕」,而不是讓GET或POST請求直接處理程序資源。查看錶單所在的HTML,查看提交給哪個URL的參數,以及GET或POST方法。你可以很容易地用urllib(2)形成這些請求。

+1

機械化軟件包可以幫助您避免「......查看提交什麼參數...」這些無聊的細節。斜紋機械化並提供更高級別的抽象。 – gimel 2009-07-05 17:40:37