代碼保持產生一個空字符串

-1

我不明白爲什麼下面的代碼保持產生一個空字符串。我試圖讓代碼提取網站的內容到一個「txt」文件，但它只是繼續生成一個空字符串。代碼中有錯誤嗎？代碼保持產生一個空字符串

import urllib3 
import certifi 


# Function: Convert information within html document to a text file 
# Append information to the file 
def html_to_text(source_html, target_file): 

    http = urllib3.PoolManager(
     cert_reqs='CERT_REQUIRED',  # Force certificate check. 
     ca_certs=certifi.where(),  # Path to the Certifi Bundle 
     headers={'connection': 'keep-alive', 'user-agent': 'Mozilla/5.0', 'accept-encoding': 'gzip, deflate'}, 
    ) 

    r = http.urlopen('GET', source_html) 
    print(source_html) 
    response = r.read().decode('utf-8') 
    # TODO: Find the problem that keeps making the code produce an empty string 
    print(response) 
    temp_file = open(target_file, 'w+') 
    temp_file.write(response) 


source_address = "https://sg.finance.yahoo.com/lookup/all?s=*&t=A&m=SG&r=&b=0" 
target_location = "C:\\Users\\Admin\\PycharmProjects\\TheLastPuff\\Source\\yahoo_ticker_symbols.txt" 

html_to_text(source_address, target_location)

來源

2016-01-05 Cloud

當你說「生產」，你的意思是「打印」或「寫入到文件」，或者「印刷和寫入到文件」？做'print（source_html）'和'print（response）'打印什麼？ – Kevin

打印和寫入功能都沒有產生任何東西。「print（source_html）」確實成功地打印了「source_address」。 – Cloud

'r'對象似乎有一個'r.data'屬性來保存響應主體。 http://urllib3.readthedocs.org/en/latest/#usage – Jasper

我用下面的代碼得到響應。唯一相關的更改是使用r.data而不是r.read()。

import urllib3 
import certifi 


def html_to_text(source_html): 

    http = urllib3.PoolManager(
     cert_reqs='CERT_REQUIRED',  # Force certificate check. 
     ca_certs=certifi.where(),  # Path to the Certifi Bundle 
     headers={'connection': 'keep-alive', 'user-agent': 'Mozilla/5.0', 'accept-encoding': 'gzip, deflate'}, 
    ) 

    r=http.urlopen('GET', source_html) 
    print(source_html) 
    print(r.headers) 
    response = r.data     # instead of read().decode('utf-8') 
    print(response) 


source_address = "https://sg.finance.yahoo.com/lookup/all?s=*&t=A&m=SG&r=&b=0" 

html_to_text(source_address)

使用的版本：

>>> certifi.__version__ 
'2015.11.20.1' 
>>> urllib3.__version__ 
'1.14' 
>>> sys.version 
'3.5.1 (default, Dec 7 2015, 12:58:09) \n[GCC 5.2.0]'

來源

2016-01-05 13:52:47 Jasper

此代碼似乎是工作，但我得到另一個錯誤：「urllib.error.HTTPError：HTTP錯誤502：服務器掛斷」。我認爲這是網站踢我出去。 – Cloud

代碼保持產生一個空字符串

回答

相關問題