urllib2.quote無法正常工作

我試圖獲取包含變音符號（í，č...）的頁面的html。問題是urllib2.quote似乎沒有按我的預期工作。urllib2.quote無法正常工作

就我而言，報價應該將包含變音符號的url轉換爲正確的url。

下面是一個例子：

url = 'http://www.example.com/vydavatelství/' 

print urllib2.quote(url) 

>> http%3A//www.example.com/vydavatelstv%C3%AD/

的問題是，它改變http//字符串出於某種原因。然後urllib2.urlopen(req)返回錯誤：

response = urllib2.urlopen(req)
File "C:\Python27\lib\urllib2.py", line 154, in urlopen return opener.open(url, data, timeout) File "C:\Python27\lib\urllib2.py", line 437, in open response = meth(req, response)
File "C:\Python27\lib\urllib2.py", line 550, in http_response 'http', request, response, code, msg, hdrs)
File "C:\Python27\lib\urllib2.py", line 475, in error return self._call_chain(*args)
File "C:\Python27\lib\urllib2.py", line 409, in _call_chain result = func(*args)
File "C:\Python27\lib\urllib2.py", line 558, in http_error_default raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
urllib2.HTTPError: HTTP Error 400: Bad Request

來源

2015-04-12 Milano Slesarik

您是否試過在腳本的頂部放置＃ - * - coding：utf-8 - * - ？ – thefragileomen

- TL; DR -

兩件事情。首先確保你在你的python腳本的頂部包含你的shebang 。這讓我們知道如何在文件中編碼文本。第二件事，你需要指定安全字符，這些字符不會被quote方法轉換。默認情況下，只有/被指定爲安全字符。這意味着:正在轉換，這正在破壞您的網址。

url = 'http://www.example.com/vydavatelství/' 
urllib2.quote(url,':/') 
>>> http://www.example.com/vydavatelstv%C3%AD/

- 阿多在此 -

所以這裏的第一個問題是，urllib2的文檔是相當差。通過Kamal提供的鏈接，我看不到文檔中的quote方法。這使得問題解決相當困難。

就這樣說，讓我稍微解釋一下。

urllib2.quote似乎與urllib的報價執行相同，即documented pretty well。 urllib2.quote（）需要四個參數

urllib.parse.quote(string, safe='/', encoding=None, errors=None) 
## string: string your trying to encode 
##  safe: string contain characters to ignore. Defualt is '/' 
## encoding: type of encoding url is in. Default is utf-8 
## errors: specifies how errors are handled. Default is 'strict' which throws a UnicodeEncodeError, I think.

來源

2015-06-05 19:01:25

urllib2.quote無法正常工作

回答

相關問題