2010-05-02 28 views
0

我想抓取使用機械化的網站。 該網站提供不同頁面的搜索結果。 發佈以獲取下一組結果時,出現了一些錯誤,服務器將我重定向到第一頁,要求機械化更新SearchSession Cookie。Python機械化無法避免重定向時發佈

我一直在調試使用Firefox的請求,他們看起來完全一樣, ,我無法找到問題。任何建議?請求下:

-----------第一個正確的序列,在FIREFOX中使用篡改---------------------- --- POST XXX /職位搜索/ Results.aspx?關鍵詞= Python的& LTxt =倫敦%2C +南+東&半徑= 0 & LIds2 = ZV & CLID = 1621 & cltypeid = 2 &列表CLNAME =倫敦負載標誌[LOAD_DOCUMENT_URI LOAD_INITIAL_DOCUMENT_URI]內容大小[-1] Mime類型[text/html] 請求頭: 主機[www.cwjobs.co.uk] 用戶代理[Mozilla/5.0(X11; U; Linux i686; en-US; rv:1.9.1.9)Gecko/20100401 Ubuntu/9.10(karmic)Firefox/3.5.9] 接受[text/html,application/xhtm Accept-Language [en-us,en; q = 0.5] Accept-Encoding [gzip,deflate] Accept-Charset [ISO]應用程序/ xml; q = 0.9,/; q = 0.8] Accept- -8859-1,utf-8; q = 0.7,*; q = 0.7] Keep-Alive [300] 連接[keep-alive] Referer [XXX/JobSearch/Results.aspx?Python & LTxt = London%2c + South + East & Radius = 0 & LIds2 = ZV & clid = 1621 & cltypeid = 2 & clName = London] Cookie [ecos = 774803468-0; AnonymousUser = MemberId = acc079dd-66b6-4081-9b07-60d6955ee8bf & IsAnonymous = True; PJBIPPOPUP =; WT_FPC = ID = 86.181.183.106-2262469600。30073025:LV = 1272812851736:SS = 1272812789362; SearchSession = SessionGuid = 71de63de-3bd0-4787-895d-b6b9e7c93801 & LOGSOURCE = NAT] 郵政數據: __EVENTTARGET [srpPager%24btnForward] __EVENTARGUMENT [] hdnSearchResults [BV%2CA%2CC0P5x%2COou-%2CB4S-%2CBuC- %2CDzx-%2CHwn-%2CKPP-%2CIVA-%2CC9D-%2CH6X-%2CH7x-%2CJ0x-%2CCvX-%2CCra-%2COHa-%2CHhP-%2CCoj-%2CBlM-%2CE9W-%2CIm8-%2CBqG - %2CPFy-%2 CN%2Fm-%2Ceaa%2CCvj-%2CCtJ-%2CCr7-%2CBpu-%2Cmh%2CMb6-%2CJ%2Fk-%2CHY8-%2COJ7-%2CNtF-%2CEya-%2CErT-%2CEo4 - %2CEKU-%2CDnL-%2CC5M-%2CCyB-%2CBsD-%2CBrc-%2CBpU-%2Col%2C30 2CC1%%2Cd4N%2COo8-%2COi0-%2CLz%2F-%2CLxP-%2CFyp-%2CFVR- %2CEHL-%2CPrP-%2CLmE-%2CK3H-%2CKXJ-%2CFyn%2CIcq-%2CIco-%2CIK4-%2CIIg-%2CH2k-%2CH0N-%2CHwp-%2CHvF-%2CFij-%2CFhl-%2CCwj- %2CCb5-%2CCQj-%2CCQh-%2CB%2B2-%2CBc6-%2ChFo%2CNLq-%2CNI%2F-%2CFzM-%2Cdu-%2CHg2-%2CBug-%2CBse-%2CB9Q-] __VIEWSTATE [ 2FwEPDwUKLTkyMzI2ODA4Ng9kFg YCBA8WBB4EaHJlZgWJAWh0dHA6Ly93d3cuY3dqb2JzLmNvLnVrL0pvYlNlYXJjaC9SU1MuYXNweD9LZXl3b3Jkcz1QeXRob24mTFR4dD1Mb25kb24lMmMrU291dGgrRWFzdCZSYWRpdXM9MCZMSWRzMj1aViZjbGlkPTE2MjEmY2x0eXBlaWQ9MiZjbE5hbWU9TG9uZG9uHgV0aXRsZQUkTGF0ZXN0IFB5dGhvbiBqb2JzIGZyb20gQ1dKb2JzLmNvLnVrZAIGDxYCHgRUZXh0BV48bGluayByZWw9ImNhbm9uaWNhbCIgaHJlZj0iaHR0cDovL3d3dy5jd2pvYnMuY28udWsvSm9iU2Vla2luZy9QeXRob25fTG9uZG9uX2wxNjIxX3QyLmh0bWwiIC8%2BZAIIEGRkFg4CBw8WAh8CBV9Zb3VyIHNlYXJjaCBvbiA8Yj5LZXl3b3JkczogUHl0aG9uOyBMb2NhdGlvbjogTG9uZG9uLCBTb3V0aCBFYXN0OyA8L2I%2BIHJldHVybmVkIDxiPjg1PC9iPiBqb2JzLmQCCQ8WAh4HVmlzaWJsZWhkAgsPFgIfAgUoVGhlIG1vc3QgcmVsZXZhbnQgam9icyBhcmUgbGlzdGVkIGZpcnN0LmQCEw8PFgIeC05hdmlnYXRlVXJsBQF%2BZGQCFQ9kFgYCBQ8PFgYfAgUGUHl0aG9uHgtEZWZhdWx0VGV4dAUMZS5nLiBhbmFseXN0HhNEZWZhdWx0VGV4dENzc0NsYXNzZWRkAgsPDxYGHwIFEkxvbmRvbiwgU291dGggRWFzdB8FBQllLmcuIEJhdGgfBmVkZAIRDxAPFgYeDURhdGFUZXh0RmllbGQFClJhZGl1c05hbWUeDkRhdGFWYWx1ZUZpZWxkBQZSYWRpdXMeC18hRGF0YUJvdW5kZ2QQFREHMCBtaWxlcwcyIG1pbGVzBzUgbWlsZXMIMTAgbWlsZXMIMTUgbWlsZXMI MjAgbWlsZXMIMjUgbWlsZXMIMzAgbWlsZXMIMzUgbWlsZXMINDAgbWlsZXMINDUgbWlsZXMINTAgbWlsZXMINjAgbWlsZXMINzAgbWlsZXMIODAgbWlsZXMIOTAgbWlsZXMJMTAwIG1pbGVzFREBMAEyATUCMTACMTUCMjACMjUCMzACMzUCNDACNDUCNTACNjACNzACODACOTADMTAwFCsDEWdnZ2dnZ2dnZ2dnZ2dnZ2dnZGQCFw9kFgQCAQ9kFgQCBA8QZA8WA2YCAQICFgMQBQhBbGwgam9icwUBMGcQBRlEaXJlY3QgZW1wbG95ZXIgam9icyBvbmx5BQEyZxAFEEFnZW5jeSBqb2JzIG9ubHkFATFnZGQCBg8QZA8WA2YCAQICFgMQBQlSZWxldmFuY2UFATFnEAUERGF0ZQUBMmcQBQZTYWxhcnkFATNnZGQCBQ8PFgYeClBhZ2VOdW1iZXICAh4PTnVtYmVyT2ZSZXN1bHRzAlUeDlJlc3VsdHNQZXJQYWdlAhRkZAIZDxYCHwNoZGQ%3D] Refinesearch%24txtKeywords [Python的] Refinesearch%24txtLocation [倫敦%2C +南東+] Refinesearch%24ddlRadius [0] ddlCompanyType [0] ddlSort [1] 響應頭: 緩存控制[私人] 日期[Su n,02 May 2010 16:09:27 GMT] Content-Type [text/html; charset = utf-8] X-Powered-By [ASP.NET] X [Site-Host] [P310] X [Powered by By] [NET] X [ -AspNet-Version [2.0.50727] Set-Cookie [SearchSession = SessionGuid = 71de63de-3bd0-4787-895d-b6b9e7c93801 & LogSource = NAT;路徑= /] 內容編碼[gzip的] 因人而異[接受編碼] 傳送編碼[分塊]

--------什麼現在我'SENDING USING機械化,一些頭添加ETC ----------- POST /JobSearch/Results.aspx?Keywords=Python & LTxt =倫敦%2C +南+東&半徑= 0 & LIds2 = ZV & CLID = 1621 & cltypeid = 2 & clName = London HTTP/1.1 \ r \ n內容長度:2424 \ r \ n Accept-Language:zh-cn,en; q = 0.5 \ r \ n Accept-Encoding:gzip \ r \ n Host :www.cwjobs.co.uk \ r \ n Accept:text/html,application/xhtml + xml,application/xml; q = 0.9,/; q = 0.8 \ r \ n Accept-Charset:ISO-8859-1,utf-8; q = 0.7 ,*; q = 0的。7 \ r \ n 連接:保持活動\ r \ n Cookie:AnonymousUser = MemberId = 8fa5ddd7-17ed-425e-b189-82693bfbaa0c & IsAnonymous = True; SearchSession = SessionGuid = 33e4e439-c2d6-423f-900F-574099310d5a & LOGSOURCE = NAT \ r \ n 的Referer:XXX /職位搜索/ Results.aspx關鍵詞= Python的& LTxt =倫敦%2C +南+東&半徑= 0 & LIds2 = ZV & CLID = 1621 & cltypeid = 2 &列表CLNAME =倫敦\ r \ n 內容類型:應用/ X WWW的窗體-urlencoded \ r \ n \ r \ N ' ' __EVENTTARGET = srpPager%24btnForward & __EVENTARGUMENT = & hdnSearchResults = BV%2CA%2CC0eif%2CMwc%2CM6s%2COou%2CK09%2CG4H%2CEZf%2CGTu%2CLrr%2CGuX%2CGs9%2CEz9%2CL5X%2CL9U%2ChU%2CHHf%2CMAL%2CNDi%2CJrY% 2CGBy%2CM%2BO%2CdE-%2CpI%2CtDI%2CL5L% 2CL7l%2CL8z%2CM%2FA%2CPPP%2CCM0%2CEpK%2CHPy%2Cez%2C7p%2CJ2U%2CJ9b%2CJ%2F2%2CKea%2CLBj%2CLvi%2CL2t%2CM8r%2CM9S%2CM%2FA%2CPRT%2CPgi%2Csg7% 2CF6%2CI2F%2CJTd%2 CO-%2CC0v%2CC3f%2CDCq%2CDxn%2CERl%2CUbV%2CGME%2CGMG%2CGd2%2CGgO%2CGyK%2CG0h%2CG4F%2CG5p%2CJGL%2CJHJ%2CKhj%2CL4L%2CMM1%2CMYL%2CMYN %2CMp4%2CNL0%2COrj%2CvuW%2CBdE%2CBfv%2CI1i%2CBCh-%2COLA%2CHH4%2CM6O%2CM8Q%2CMre & __VIEWSTATE =%2FwEPDwUKLTkyMzI2ODA4Ng9kFgYCBA8WBB4EaHJlZgWJAWh0dHA6Ly93d3cuY3dqb2JzLmNvLnVrL0pvYlNlYXJjaC9SU1MuYXNweD9LZXl3b3Jkcz1QeXRob24mTFR4dD1Mb25kb24lMmMrU291dGgrRWFzdCZSYWRpdXM9MCZMSWRzMj1aViZjbGlkPTE2MjEmY2x0eXBlaWQ9MiZjbE5hbWU9TG9uZG9uHgV0aXRsZQUkTGF0ZXN0IFB5dGhvbiBqb2JzIGZyb20gQ1dKb2JzLmNvLnVrZAIGDxYCHgRUZXh0BV48bGluayByZWw9ImNhbm9uaWNhbCIgaHJlZj0iaHR0cDovL3d3dy5jd2pvYnMuY28udWsvSm9iU2Vla2luZy9QeXRob25fTG9uZG9uX2wxNjIxX3QyLmh0bWwiIC8%2BZAIIEGRkFg4CBw8WAh8CBV9Zb3VyIHNlYXJjaCBvbiA8Yj5LZXl3b3JkczogUHl0aG9uOyBMb2NhdGlvbjogTG9uZG9uLCBTb3V0aCBFYXN0OyA8L2I%2BIHJld HVybmVkIDxiPjg1PC9iPiBqb2JzLmQCCQ8WAh4HVmlzaWJsZWhkAgsPFgIfAgUoVGhlIG1vc3QgcmVsZXZhbnQgam9icyBhcmUgbGlzdGVkIGZpcnN0LmQCEw8PFgIeC05hdmlnYXRlVXJsBQF%2BZGQCFQ9kFgYCBQ8PFgYfAgUGUHl0aG9uHgtEZWZhdWx0VGV4dAUMZS5nLiBhbmFseXN0HhNEZWZhdWx0VGV4dENzc0NsYXNzZWRkAgsPDxYGHwIFEkxvbmRvbiwgU291dGggRWFzdB8FBQllLmcuIEJhdGgfBmVkZAIRDxAPFgYeDURhdGFUZXh0RmllbGQFClJhZGl1c05hbWUeDkRhdGFWYWx1ZUZpZWxkBQZSYWRpdXMeC18hRGF0YUJvdW5kZ2QQFREHMCBtaWxlcwcyIG1pbGVzBzUgbWlsZXMIMTAgbWlsZXMIMTUgbWlsZXMIMjAgbWlsZXMIMjUgbWlsZXMIMzAgbWlsZXMIMzUgbWlsZXMINDAgbWlsZXMINDUgbWlsZXMINTAgbWlsZXMINjAgbWlsZXMINzAgbWlsZXMIODAgbWlsZXMIOTAgbWlsZXMJMTAwIG1pbGVzFREBMAEyATUCMTACMTUCMjACMjUCMzACMzUCNDACNDUCNTACNjACNzACODACOTADMTAwFCsDEWdnZ2dnZ2dnZ2dnZ2dnZ2dnZGQCFw9kFgQCAQ9kFgQCBA8QZA8WA2YCAQICFgMQBQhBbGwgam9icwUBMGcQBRlEaXJlY3QgZW1wbG95ZXIgam9icyBvbmx5BQEyZxAFEEFnZW5jeSBqb2JzIG9ubHkFATFnZGQCBg8QZA8WA2YCAQICFgMQBQlSZWxldmFuY2UFATFnEAUERGF0ZQUBMmcQBQZTYWxhcnkFATNnZGQCBQ8PFgYeClBhZ2VOdW1iZXICAR4PTnVtYmVyT2ZSZXN1bHRzAlUeDlJlc3VsdHNQZXJQYWdlAhRkZAI ZDxYCHwNoZGQ%3D & Refinesearch%24txtKeywords = Python的& Refinesearch%24txtLocation =倫敦%2CSouth +東& Refinesearch%24ddlRadius = 0 & Refinesearch%24btnSearch =搜尋& ddlCompanyType = 0 & ddlSort = 1'

回答

1

SearchSession cookies非常不同:工作人員有

SearchSession=SessionGuid=71de63de-3bd0-4787-895d-b6b9e7c93801 

and the non-workin g已經有

SearchSession=SessionGuid=33e4e439-c2d6-423f-900f-574099310d5a 

你有什麼辦法來獨立驗證爲什麼第二個可能不被服務器接受嗎? (這可能不是這種情況,但由於服務器正在抱怨你的SearchSession cookie,它似乎應該是第一個查詢行)。