Python - 使用curl和請求庫時檢索到不同結果

我試圖使用requests庫構建python爬蟲。當我使用get方法檢索結果如下：THá» THAO。但是當我使用curl時，我得到了THỂ THAO，這是我的預期結果。這裏是我的代碼：Python - 使用curl和請求庫時檢索到不同結果

def get_raw_channel(): 
    r = requests.get('http://vtv.vn/') 
    raw_html = r.text 
    soup = BeautifulSoup(raw_html) 
    o_tags = soup.find_all("option") 
    for o_tag in o_tags: 
     print o_tag.text 
     # raw_channel = RawChannel(o_tag.text.strip(), o_tag['value']) 
     # channels_file.write(raw_channel.__str__() + '\n')

這裏是我的捲曲CMD：curl http://vtv.vn/

問：爲什麼結果不同呢？我如何使用requests實現curl的結果？

來源

2015-02-09 mr.icetea

什麼是響應體的編碼？ – 2015-02-09 08:17:05

@LutzHorn '（Date：Mon，09 Feb 2015 07:59:34 GMT，Content-Type：text/html，Transfer-Encoding：chunked，Connection：close，Vary：Accept-Encoding，Server：vtv-rp' 這是curl響應頭。和： '{'via'：'1.1 TMG'，'proxy-connection'：'Keep-Alive'，'transfer-encoding'：'chunk ed'，'vary'：' Accept-Encoding'，'server'：'vtv-rp'，'connection'：'Keep-Alive'， 'date'：'Mon，09 Feb 2015 08:19:52 GMT'，'content-type'： 'text/html'}'是請求響應頭。 – 2015-02-09 08:20:25

@LutzHorn我沒有看到響應的編碼，但我認爲它是'utf-8' – 2015-02-09 08:22:28

我想你的代碼，並在我的情況的編碼是「ISO-8859-1」，嘗試的過程之前，您的數據編碼成UTF-8它BS，是這樣的：

... 
raw_html = r.text.encode("utf-8") 
soup = BeautifulSoup(raw_html) 
...

UPDATE ： 我做了一些測試，貌似一切工作適合我，因爲我明確地設置編碼的要求，看看

In [1]: import requests 
In [2]: from BeautifulSoup import BeautifulSoup 
In [3]: r = requests.get('http://vtv.vn/') 
In [4]: r.encoding = "utf-8" 
In [5]: raw_html = r.text 
In [6]: soup = BeautifulSoup(raw_html) 
In [7]: soup.findAll("option") 
Out[7]: 
[<option value="1"> 
VTV1</option>, 
... stripped out some output ... 

VTVCab3 - Thể thao TV</option>, 
<option value="13"> 

... stripped out some output ... 
]

來源

2015-02-09 08:59:08 artemdevel

感謝您的回答，但它不適用於我:( – 2015-02-09 09:07:38

哇！工作，非常感謝你。 – 2015-02-09 09:39:46

Python - 使用curl和請求庫時檢索到不同結果

回答

相關問題