2012-01-17 28 views
0

當我運行下面的代碼,我得到一個mechanize._html.ParseError異常。 我該如何讓它閉嘴?我知道這是無效的HTML,我不想解析它,如果這是一個很好的網站。我做了谷歌周圍,並被告知取代br = mechanize.Browser()br = mechanize.Browser(factory=mechanize.RobustFactory()),但沒有奏效。python mechanize._html.ParseError

import mechanize 

#br = mechanize.Browser() 
br = mechanize.Browser(factory=mechanize.RobustFactory()) 
br.set_handle_robots(False) 
br.open("http://journeyplanner.irishrail.ie/bin/query.exe") 
for form in br.forms(): 
     print form 
     print 

回答

0

你爲什麼要打開一個.exe文件,mechanize?你應該使用它打開網頁。如果要下載.exe文件,請改爲使用br.retrieve()

編輯:

順便說一句,你的代碼生成此輸出對我來說:

<formular POST http://journeyplanner.irishrail.ie/bin/query.exe/dn?ld=1.1&OK#focus application/x-www-form-urlencoded 
    <HiddenControl(queryPageDisplayed=yes) (readonly)> 
    <HiddenControl(HWAI=JS!ajax=yes) (disabled, readonly)> 
    <HiddenControl(HWAI=JS!js=yes) (disabled, readonly)> 
    <HiddenControl(outwardConDetails=) (readonly)> 
    <ImageControl(start=Verbindung suchen)> 
    <TextControl(REQ0JourneyStopsS0A=255)> 
    <TextControl(REQ0JourneyStopsS0G=)> 
    <HiddenControl(REQ0JourneyStopsS0ID=) (readonly)> 
    <TextControl(REQ0JourneyStopsZ0A=255)> 
    <TextControl(REQ0JourneyStopsZ0G=)> 
    <HiddenControl(REQ0JourneyStopsZ0ID=) (readonly)> 
    <RadioControl(journey_mode=[*single, return])> 
    <TextControl(REQ0JourneyDate=17/01/2012)> 
    <SelectControl(REQ0JourneyTime=[*0, 00, 9, 14, 18])> 
    <HiddenControl(REQ0HafasPeriodToSearch=1440) (readonly)> 
    <HiddenControl(REQ0HafasPeriodSearch=2) (readonly)> 
    <HiddenControl(REQ0HafasSearchForw=1) (readonly)> 
    <CheckboxControl(special_search_both=[1])> 
    <TextControl(REQ1JourneyDate=)> 
    <SelectControl(REQ1JourneyTime=[*0, 00, 9, 14, 18])> 
    <HiddenControl(REQ1HafasPeriodToSearch=1440) (readonly)> 
    <HiddenControl(REQ1HafasPeriodSearch=2) (readonly)> 
    <HiddenControl(REQ1HafasSearchForw=1) (readonly)> 
    <SubmitControl(start=Go) (readonly)> 
    <SubmitControl(start=Go) (readonly)>> 

編輯:

哦,我錯了......這不是一個.exe檔案。我下載它並用文本編輯器打開,它只是一個.html文件!它也適用於br = mechanize.Browser()

+0

我相信它可以與舊版本的beautifulsoup一起使用,因爲(iirc)mechanize使用beautifulsoup,因爲它是html解析器。但是,這不是一個選項。 – wheybags 2012-01-17 19:23:19

+0

我得到Traceback(最近調用最後一次): 文件「./train.py」,第9行,在 for br.forms(): 文件「/usr/lib/python2.6/dist- packages/mechanize/_mechanize.py「,第426行,格式爲 return self._factory.forms() 文件」/usr/lib/python2.6/dist-packages/mechanize/_html.py「,第559行,in表格 self._forms_factory.forms()) 文件「/usr/lib/python2.6/dist-packages/mechanize/_html.py」,第228行,格式爲 raise ParseError(exc) mechanize._html.ParseError – wheybags 2012-01-17 19:24:17