合法化Web請求，以便服務器允許請求通過

我一直在嘗試運行下面的代碼，但它一直產生HTTP錯誤502.我認爲錯誤的原因是因爲網站知道程序是試圖從中獲取信息。因此，它不允許請求。有沒有辦法欺騙服務器認爲這是一個合法的Web請求？我曾嘗試添加標題，但它仍然無效。合法化Web請求，以便服務器允許請求通過

import urllib.request 


# Function: Convert information within html document to a text file 
# Append information to the file 
def html_to_text(source_html, target_file): 

    opener = urllib.request.build_opener() 
    opener.addheaders = [('User-agent', 'Mozilla/5.0')] 
    print(source_html) 
    r = opener.open(source_html) 
    response = r.read() 
    print(response) 
    temp_file = open(target_file, 'w+') 
    temp_file.write(response.__str__()) 


source_address = "https://sg.finance.yahoo.com/lookup/all?s=*&t=A&m=SG&r=&b=0" 
target_location = "C:\\Users\\Admin\\PycharmProjects\\TheLastPuff\\Source\\yahoo_ticker_symbols.txt" 

html_to_text(source_address, target_location)

來源

2016-01-02 Cloud

它適用於我。你向他們發送了多少請求？他們可能會將您的實驗檢測爲bruteforce或dos攻擊，並將您的請求的某些指紋列入黑名單 –

有沒有辦法欺騙服務器認爲請求來自合法瀏覽器？ – Cloud

是的。捕獲瀏覽器發送的流量，並將標題值複製到您的Python腳本中。 –

我已經對代碼進行了一些修改，並達到了我所要求的。

import urllib.request 
import gzip 


# Function: Convert information within html document to a text file 
# Append information to the file 
def html_to_text(source_html, target_file): 

    opener = urllib.request.build_opener() 
    # These headers are necessary to ensure that the website thinks that a browser is retrieving information 
    # not a program. 
    opener.addheaders = [('User-agent', 'Mozilla/5.0 (Windows NT 10.0; WOW64; rv:43.0) Gecko/20100101 Firefox/43.0'), 
         ('Connection', 'keep-alive'), 
         ('Accept-encoding', 'gzip, deflate'), 
         ('Accept-language', 'en-US,en;q=0.5'), 
         ('Accept', 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8'), 
         ('Host', 'sg,finance.yahoo.com'), ] 
    r = opener.open(source_html) 

    # Check from the "Response Headers" in Firebug whether the content is encoded 
    # Since the content is encoded in gzip format, decompression is necessary 
    response = gzip.decompress(r.read()) 

    # The response headers would mention the "charset" from there the encoding type can be obtained 
    response = response.decode(encoding='utf-8') 
    print(response) 
    temp_file = open(target_file, 'w+') 
    temp_file.write(response) 


source_address = "https://sg.finance.yahoo.com/lookup/all?s=*&t=A&m=SG&r=&b=0" 
target_location = "C:\\Users\\Admin\\PycharmProjects\\TheLastPuff\\Source\\yahoo_ticker_symbols.txt" 

html_to_text(source_address, target_location)

來源

2016-01-05 15:52:06 Cloud

合法化Web請求，以便服務器允許請求通過

回答

相關問題