POST請求總是返回「不允許的關鍵字符」

我想從表格中檢索大氣顆粒物的值（可惜這個網站不是英文的，所以隨便問一切）：BeautifulSoup和GET的組合失敗與requests一起發送的請求，因爲表格中填充了Bootstrap，因此解析器BeautifulSoup找不到仍然必須插入的值。POST請求總是返回「不允許的關鍵字符」

使用Firebug我檢查了頁面的每一個角度，我發現，通過選擇表中不同的一天，一個POST請求被髮送（該網站，你可以在Referer看，是http://www.arpat.toscana.it/temi-ambientali/aria/qualita-aria/bollettini/index/regionale/，所在的表是）：

POST /temi-ambientali/aria/qualita-aria/bollettini/aj_dati_bollettini HTTP/1.1 
Host: www.arpat.toscana.it  
User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:50.0) Gecko/20100101 Firefox/50.0 
Accept: */*  
Accept-Language: en-US,en;q=0.5 
Accept-Encoding: gzip, deflate 
Content-Type: application/x-www-form-urlencoded; charset=UTF-8 
X-Requested-With: XMLHttpRequest 
Referer: http://www.arpat.toscana.it/temi-ambientali/aria/qualita-aria/bollettini/index/regionale/26-12-2016 
Content-Length: 114 
Cookie: [...] 
DNT: 1 
Connection: keep-alive

用下面的PARAMS：

v_data_osservazione=26-12-2016&v_tipo_bollettino=regionale&v_zona=&csrf_test_name=b88d2517c59809a529 
b6f8141256e6ca

數據在答案是JSON格式。

所以我開始製作我的個人POST請求，以便直接獲取將填滿表格的JSON數據。

在params中，除了日期之外，需要csrf_test_name：在這裏我發現這個網站是受保護的，不受CSRF vulnerability;爲了在params中執行正確的查詢，我需要一個CSRF令牌：這就是爲什麼我對該站點執行GET請求（請參閱URL的POST請求中的Referer），並從此cookie獲取CSRF令牌：

r = get(url) 
csrf_token = r.cookies["csrf_cookie_name"]

在一天結束的時候，我的CSRF令牌和POST請求準備好了，我發送它......並且狀態碼爲200，我總是得到Disallowed Key Characters.！

尋找這個錯誤，我總是看到關於CodeIgniter的帖子，我認爲這不是我所需要的：我嘗試了頭和參數的每個組合，但沒有任何改變。在放棄BeautifulSoup和requests並開始學習Selenium之前，我想弄清楚問題是什麼：Selenium是太高級別，像BeautifulSoup和requests這樣的低級庫讓我學習了很多有用的東西，所以我更喜歡繼續學習這兩個。

下面的代碼：

from requests import get, post 
from bs4 import BeautifulSoup 
import datetime 
import json 

url = "http://www.arpat.toscana.it/temi-ambientali/aria/qualita-aria/bollettini/index/regionale/" # + %d-%m-%Y 
yesterday = datetime.date.today() - datetime.timedelta(1) 
date_object = datetime.datetime.strptime(str(yesterday), '%Y-%m-%d') 
yesterday_string = str(date_object.strftime('%d-%m-%Y')) 

full_url = url + yesterday_string 
print("REFERER " + full_url) 

r = get(url) 
csrf_token = r.cookies["csrf_cookie_name"] 
print(csrf_token) 

# preparing headers for POST request 
headers = { 
    "Host": "www.arpat.toscana.it", 
    "Accept" : "*/*", 
    "Accept-Language" : "en-US,en;q=0.5", 
    "Accept-Encoding" : "gzip, deflate", 
    "Content-Type" : "application/x-www-form-urlencoded; charset=UTF-8", 
    "X-Requested-With" : "XMLHttpRequest", # XHR 
    "Referer" : full_url, 
    "DNT" : "1", 
    "Connection" : "keep-alive" 
} 

# preparing POST parameters (to be inserted in request's body) 
payload_string = "v_data_osservazione="+yesterday_string+"&v_tipo_bollettino=regionale&v_zona=&csrf_test_name="+csrf_token 
print(payload_string) 

# data -- (optional) Dictionary, bytes, or file-like object to send in the body of the Request. 

# json -- (optional) json data to send in the body of the Request. 
req = post("http://www.arpat.toscana.it/temi-ambientali/aria/qualita-aria/bollettini/aj_dati_bollettini", 
    headers = headers, json = payload_string 
) 

print("URL " + req.url) 

print("RESPONSE:") 
print('\t'+str(req.status_code)) 
print("\tContent-Encoding: " + req.headers["Content-Encoding"]) 
print("\tContent-type: " + req.headers["Content-type"]) 
print("\tContent-Length: " + req.headers["Content-Length"]) 
print('\t'+req.text)

來源

2016-12-28 elmazzun

如果你想保持學習和使用您的必要請求必需品，但在cookies，推薦和標題管理的幫助下，我會建議你檢查['scrapy']（https://scrapy.org） – eLRuLL

我想知道我的代碼有什麼問題，但是scrapy '可能值得一試。我現在要把'pip'上班，謝謝你的建議。 – elmazzun

您可以使用http://httpbin.org發送'POST'併發送它接收到的所有數據 - 然後您可以將其與瀏覽器發送到服務器的數據進行比較。它有助於找出請求中的差異。 – furas

此代碼的工作對我來說：

我用request.Session()，是它讓所有的cookies
我用的data=代替json=
終於我不不需要所有評論元素
比較瀏覽器請求和代碼要求我用Charles網絡調試代理應用

代碼：

import requests 
import datetime 

#proxies = { 
# 'http': 'http://localhost:8888', 
# 'https': 'http://localhost:8888', 
#} 

s = requests.Session() 
#s.proxies = proxies # for test only 

date = datetime.datetime.today() - datetime.timedelta(days=1) 
date = date.strftime('%d-%m-%Y') 

# --- main page --- 

url = "http://www.arpat.toscana.it/temi-ambientali/aria/qualita-aria/bollettini/index/regionale/" 

print("REFERER:", url+date) 

r = s.get(url) 

# --- data --- 

csrf_token = s.cookies["csrf_cookie_name"] 

#headers = { 
    #'User-Agent': 'User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:50.0) Gecko/20100101 Firefox/50.0', 
    #"Host": "www.arpat.toscana.it", 
    #"Accept" : "*/*", 
    #"Accept-Language" : "en-US,en;q=0.5", 
    #"Accept-Encoding" : "gzip, deflate", 
    #"Content-Type" : "application/x-www-form-urlencoded; charset=UTF-8", 
    #"X-Requested-With" : "XMLHttpRequest", # XHR 
    #"Referer" : url, 
    #"DNT" : "1", 
    #"Connection" : "keep-alive" 
#} 

payload = { 
    'csrf_test_name': csrf_token, 
    'v_data_osservazione': date, 
    'v_tipo_bollettino': 'regionale', 
    'v_zona': None, 
} 

url = "http://www.arpat.toscana.it/temi-ambientali/aria/qualita-aria/bollettini/aj_dati_bollettini" 
r = s.post(url, data=payload) #, headers=headers) 

print('Status:', r.status_code) 
print(r.json())

代理：

來源

2016-12-28 19:06:07 furas

確實有用，謝謝！ 'Session'和'Charles'分開，看起來我錯誤地將參數傳遞給了POST請求。 – elmazzun

這是有問題的日期 - 目前還沒有關於「28-12-2016」的信息。 – furas

你是對的，我意識到數據仍然被加載後刪除了評論，我刪除了評論：這是凌晨2點，我很累xD不是很有禮貌，因爲你設法閱讀並再次回答我，所以謝謝又是你 – elmazzun

POST請求總是返回「不允許的關鍵字符」

回答

相關問題