2014-11-05 107 views
2

我想從搜索引擎給出的結果中收集信息。但是我只能在查詢部分寫文本而不是unicode。python urllib2和unicode

import urllib2 
a = "바둑" 
a = a.decode("utf-8") 
type(a) 
#Out[35]: unicode 

url = "http://search.naver.com/search.naver?where=nexearch&query=%s" %(a) 
url2 = urllib2.urlopen(url) 

給這個錯誤

#UnicodeEncodeError: 'ascii' codec can't encode characters in position 39-40: ordinal not in range(128) 

回答

4

編碼Unicode數據爲UTF-8,然後URL編碼:

from urllib import urlencode 
import urllib2 

params = {'where': 'nexearch', 'query': a.encode('utf8')} 
params = urlencode(params) 

url = "http://search.naver.com/search.naver?" + params 
response = urllib2.urlopen(url) 

演示:

>>> from urllib import urlencode 
>>> a = u"바둑" 
>>> params = {'where': 'nexearch', 'query': a.encode('utf8')} 
>>> params = urlencode(params) 
>>> params 
'query=%EB%B0%94%EB%91%91&where=nexearch' 
>>> url = "http://search.naver.com/search.naver?" + params 
>>> url 
'http://search.naver.com/search.naver?query=%EB%B0%94%EB%91%91&where=nexearch' 

使用urllib.urlencode()到構建參數比較容易,但是y ou也可以通過urllib.quote_plus()跳過query的值:

from urllib import quote_plus 
encoded_a = quote_plus(a.encode('utf8')) 
url = "http://search.naver.com/search.naver?where=nexearch&query=%s" % encoded_a