2016-01-01 74 views
0

我試圖使用Python請求發送POST請求來刮取此ASP.NET網站的搜索結果。即使我使用GET請求來獲取requestverificationtoken,包括它在我的頭我得到的只是得到這個答覆:使用python來刮取url中帶有id的ASP.NET網站

{"Token":"Y2VgsmEAAwA","Link":"/search/Y2VgsmEAAwA/"} 

這不是有效鏈接。這是我的POST請求中包含的沒有定義的到達數據或區域的總搜索結果。我錯過了什麼?我該如何抓取這樣的網站,爲網址生成(會話?)ID?

非常感謝您提前向大家介紹!

我的Python腳本:

import json 
import requests 
from bs4 import BeautifulSoup 

r = requests.Session() 

# GET request 
gr = r.get("http://www.feline.dk") 
bsObj = BeautifulSoup(gr.text,"html.parser") 
auth_string = bsObj.find("input", {"name": "__RequestVerificationToken"})['value'] 
#print(auth_string) 
#print(gr.url) 

# POST request 
search_request = { 
    "Geography.Geography":"Danmark", 
    "Geography.GeographyLong=":"Danmark (Ferieområde)", 
    "Geography.Id":"da509992-0830-44bd-869d-0270ba74ff62", 
    "Geography.SuggestionId": "", 
    "Period.Arrival":"16-1-2016", 
    "Period.Duration":7, 
    "Period.ArrivalCorrection":"false", 
    "Price.MinPrice":None, 
    "Price.MaxPrice":None, 
    "Price.MinDiscountPercentage":None, 
    "Accommodation.MinPersonNumber":None, 
    "Accommodation.MinBedrooms":None, 
    "Accommodation.NumberOfPets":None, 
    "Accommodation.MaxDistanceWater":None, 
    "Accommodation.MaxDistanceShopping":None, 
    "Facilities.SwimmingPool":"false", 
    "Facilities.Whirlpool":"false", 
    "Facilities.Sauna":"false", 
    "Facilities.InternetAccess":"false", 
    "Facilities.SatelliteCableTV":"false", 
    "Facilities.FireplaceStove":"false", 
    "Facilities.Dishwasher":"false", 
    "Facilities.WashingMachine":"false", 
    "Facilities.TumblerDryer":"false", 
    "update":"true" 
    } 


payload = { 
    "searchRequestJson": json.dumps(search_request), 
    } 


header ={ 
"Accept":"application/json, text/html, */*; q=0.01", 
"Accept-Encoding":"gzip, deflate", 
"Accept-Language":"da-DK,da;q=0.8,en-US;q=0.6,en;q=0.4", 
"Connection":"keep-alive", 
"Content-Length":"720", 
"Content-Type":"application/x-www-form-urlencoded; charset=UTF-8", 
"Cookie":"ASP.NET_SessionId=ebkmy3bzorzm2145iwj3bxnq; __RequestVerificationToken=" + auth_string + "; aid=382a95aab250435192664e80f4d44e0f; cid=google-dk; popout=hidden; __utmt=1; __utma=1.637664197.1451565630.1451638089.1451643956.3; __utmb=1.7.10.1451643956; __utmc=1; __utmz=1.1451565630.1.1.utmgclid=CMWOra2PhsoCFQkMcwod4KALDQ|utmccn=(not%20set)|utmcmd=(not%20set)|utmctr=(not%20provided); BNI_Feline.Web.FelineHolidays=0000000000000000000000009b84f30a00000000", 
"Host":"www.feline.dk", 
"Origin":"http://www.feline.dk", 
#"Referer":"http://www.feline.dk/search/Y2WZNDPglgHHXpe2uUwFu0r-JzExMYi6yif5KNswMDBwMDAAAA/", 
"User-Agent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.106 Safari/537.36", 
"X-Requested-With":"XMLHttpRequest" 
} 

gr = r.post(
    url = 'http://www.feline.dk/search', 
    data = payload, 
    headers = header 
    ) 

#print(gr.url) 
bsObj = BeautifulSoup(gr.text,"html.parser") 
print(bsObj) 
+0

任何幫助的傢伙?謝謝! – Wessi

回答

1

倍數嘗試後,我發現你的搜索請求misformatted(必須是URL編碼,而不是JSON)和cookie信息在頭overwrited(只是讓會議提出工作)。

我喜歡簡單的代碼,我得到了想要的結果

r = requests.Session() 

# GET request 
gr = r.get("http://www.feline.dk") 
bsObj = BeautifulSoup(gr.text,"html.parser") 
auth_string = bsObj.find("input", {"name": "__RequestVerificationToken"})['value'] 

# POST request 
search_request = "Geography.Geography=Hou&Geography.GeographyLong=Hou%2C+Danmark+(Ferieomr%C3%A5de)&Geography.Id=847fcbc5-0795-4396-9318-01e638f3b0f6&Geography.SuggestionId=&Period.Arrival=&Period.Duration=7&Period.ArrivalCorrection=False&Price.MinPrice=&Price.MaxPrice=&Price.MinDiscountPercentage=&Accommodation.MinPersonNumber=&Accommodation.MinBedrooms=&Accommodation.NumberOfPets=&Accommodation.MaxDistanceWater=&Accommodation.MaxDistanceShopping=&Facilities.SwimmingPool=false&Facilities.Whirlpool=false&Facilities.Sauna=false&Facilities.InternetAccess=false&Facilities.SatelliteCableTV=false&Facilities.FireplaceStove=false&Facilities.Dishwasher=false&Facilities.WashingMachine=false&Facilities.TumblerDryer=false" 

gr = r.post(
    url = 'http://www.feline.dk/search/', 
    data = search_request, 
    headers = {'Content-Type': 'application/x-www-form-urlencoded'} 
) 

print(gr.url) 

結果:

http://www.feline.dk/search/Y2U5erq-ZSr7NOfJEozPLD5v-MZkw8DAwMHAAAA/ 
+0

非常感謝你@Gaetan。我覺得真的很愚蠢 - 我認爲問題要複雜得多。再次,謝謝一堆。 – Wessi