我正在從python請求轉移到scrapy,我想提出一個請求,單擊instagram hashtag頁面底部的按鈕。Scrapy Post Data
捲曲是這個
curl "https://www.instagram.com/query/" -H "cookie: mid=VwBJIwAEAAGiVNY3epWm9pRgD9Ge; fbm_124024574287414=base_domain=.instagram.com; ig_pr=1; ig_vw=956; s_network=; fbsr_124024574287414=5HQEzU7XMqOLO4KeQMmSvyBcKsH2svemV1-nWIE4_iM.eyJhbGdvcml0aG0iOiJITUFDLVNIQTI1NiIsImNvZGUiOiJBUUQ0TnNLMjVCZmdvUFN4TjdfODNQaW81Z3U4MTNaZmZWVlNCcEdJNUdRWlczdmdfNGVXNXJyck5Sc3NXRFlSWjZiZEpWMU95V3hNUUcwSE9qMHItYlRiYk40VXpNZG5aLUJ5Zzk0VWZNSW1RZTd4R1JzTS1yaXRabmc0Z3FYNkpwbnF4b0VXajRPNEVGSDVoTXBCUFNHUGNHN0RHQ01uSjFLeXh1dllOc2cyaFpnSDFheVI0RUhMbE1nZGM4emVrNm9DXzdLa2s1TUoyYzhyYmEwWXo1VkI1bVVmS3NvLS11dXVxdjJlRmxFUHpYczVNQ3E1bW5BRk5IeWxxMG9veENQcXcwWUVLSnpsNnZSUzFReGUzQWpsQzFPU0cySU1QM0wwMGhUcnRraFF4ZEFhZElVMUtNNUw5VTRab2dlbjltdUFadkJjV0U3UUMxeTdibDRyTzhwWCIsImlzc3VlZF9hdCI6MTQ3NDkzODQ3MywidXNlcl9pZCI6IjEzNzc3ODgzNjkifQ; csrftoken=th33gPnvrsNS74reomY69ETfojX2avQ7" -H "origin: https://www.instagram.com" -H "accept-encoding: gzip, deflate, br" -H "accept-language: en-US,en;q=0.8" -H "user-agent: Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.116 Safari/537.36" -H "x-requested-with: XMLHttpRequest" -H "x-csrftoken: th33gPnvrsNS74reomY69ETfojX2avQ7" -H "x-instagram-ajax: 1" -H "content-type: application/x-www-form-urlencoded" -H "accept: */*" -H "referer: https://www.instagram.com/explore/tags/love/" -H "authority: www.instagram.com" --data "q=ig_hashtag(love)+"%"7B+media.after(J0HV-nGYwAAAF0HV-nGXAAAAFjgA"%"2C+10)+"%"7B"%"0A++count"%"2C"%"0A++nodes+"%"7B"%"0A++++caption"%"2C"%"0A++++code"%"2C"%"0A++++comments+"%"7B"%"0A++++++count"%"0A++++"%"7D"%"2C"%"0A++++comments_disabled"%"2C"%"0A++++date"%"2C"%"0A++++dimensions+"%"7B"%"0A++++++height"%"2C"%"0A++++++width"%"0A++++"%"7D"%"2C"%"0A++++display_src"%"2C"%"0A++++id"%"2C"%"0A++++is_video"%"2C"%"0A++++likes+"%"7B"%"0A++++++count"%"0A++++"%"7D"%"2C"%"0A++++owner+"%"7B"%"0A++++++id"%"0A++++"%"7D"%"2C"%"0A++++thumbnail_src"%"2C"%"0A++++video_views"%"0A++"%"7D"%"2C"%"0A++page_info"%"0A"%"7D"%"0A+"%"7D&ref=tags"%"3A"%"3Ashow" --compressed
所以對我已經試過兩件事情表單數據:
body = response.xpath("//body")
html = str(body.extract())
end_cursor = re.search(r"\"end\_cursor\"\: \"(.+?)\"", html).group(1)
data = "q=ig_hashtag({})+%7B+media.after({}+10)+%7B%0A++count%2C%0A++nodes+%7B%0A++++caption%2C%0A++++code%2C%0A++++comments+%7B%0A++++++count%0A++++%7D%2C%0A++++comments_disabled%2C%0A++++date%2C%0A++++dimensions+%7B%0A++++++height%2C%0A++++++width%0A++++%7D%2C%0A++++display_src%2C%0A++++id%2C%0A++++is_video%2C%0A++++likes+%7B%0A++++++count%0A++++%7D%2C%0A++++owner+%7B%0A++++++id%0A++++%7D%2C%0A++++thumbnail_src%2C%0A++++video_views%0A++%7D%2C%0A++page_info%0A%7D%0A+%7D&ref=tags%3A%3Ashow".format(tag, end_cursor)
url = 'https://www.instagram.com/query/'
yield Request(url, body=data, method="POST", callback=self.parseHashtag)
這
data = {"q" :"ig_hashtag({})+%7B+media.after({}+10)+%7B%0A++count%2C%0A++nodes+%7B%0A++++caption%2C%0A++++code%2C%0A++++comments+%7B%0A++++++count%0A++++%7D%2C%0A++++comments_disabled%2C%0A++++date%2C%0A++++dimensions+%7B%0A++++++height%2C%0A++++++width%0A++++%7D%2C%0A++++display_src%2C%0A++++id%2C%0A++++is_video%2C%0A++++likes+%7B%0A++++++count%0A++++%7D%2C%0A++++owner+%7B%0A++++++id%0A++++%7D%2C%0A++++thumbnail_src%2C%0A++++video_views%0A++%7D%2C%0A++page_info%0A%7D%0A+%7D&ref=tags%3A%3Ashow".format(tag, end_cursor)}
yield FormRequest(url, formdata=data, callback=self.parseHashtag)
我得到一個403錯誤,所以我我顯然發送的數據不正確,我是不正確地格式化數據還是錯誤地調用帖子?這是我的兩個想法,但我很不確定。任何幫助將非常感激,謝謝。
的網址是這樣的 - https://www.instagram.com/explore/tags/love/
這是我的混帳,https://github.com/Fuledbyramen/instagram_crawler/blob/master/instagram/spiders/instagram_spider.py
哦,我實際上添加的報頭,但只是想知道如何發送數據,這是我加入了其中的一些:CSRF = re.search(R「csrftoken \ =(+ ?)\;「,str(response.headers))。group(1)response.headers ['x-csrftoken'] = csrf – Fuledbyramen
FormRequest幾乎只是將method ='POST'的請求,它將formdata中的值轉換爲請求身體。所以只要有正確的標題和正文的Request(method ='POST')就足夠了。您還可以通過簡單地使用response.headers.get('csrftoken')來優化您的頁眉搜索,因爲scrapy Response已經很好地爲您設置了一切格式。 – Granitosaurus
那麼我會正確格式化第一個請求嗎?我可能堅持使用Request而不使用FormRequest,因爲cURL不給出字典,只是一個字符串 – Fuledbyramen