1
我正在用Python 2.7使用BeautifulSoup來抓取網站。這裏是我的代碼:UnicodeWarning:Unicode等於比較。如何替換NavigableString dataype中的非標準字符?
# -*- coding: utf-8 -*-
from BeautifulSoup import BeautifulSoup
import urllib
import json
url = 'http://www.website.com'
file_pointer = urllib.urlopen(url)
html_object = BeautifulSoup(file_pointer)
type_select = html_object('select',{'id':'which'})
for option in type_select:
value = option('option')
for type_value in value:
type = type_value.contents[0]
param_1 = type_value['value']
print 'Type:', type
url2 = 'http://www/website.com/' + param_1
file_pointer2 = urllib.urlopen(url2)
html_object2 = BeautifulSoup(file_pointer2)
result = json.loads(str(html_object2))
for json1 in result['DATA']:
category = json1[0].title()
param_2 = json1[0]
print ' Category:', category
url3 = 'http://www/website.com/' + param_2 + '&which=' + param_1
file_pointer3 = urllib.urlopen(url3)
html_object3 = BeautifulSoup(file_pointer3)
result2 = json.loads(str(html_object3))
for json2 in result2['DATA']:
sub_category = json2[0]
param_3 = sub_category.replace(' ','+').replace('&','%26')
print ' sub_category:', sub_category
for i in param_3:
if i == 'â':
print i
...
我需要更換'â'
字符第四URL請求繼續我刮,但不管如何我嘗試更換(u'\u2019'
,â
等),我得到了UnicodeEncodeError
。
我試圖將param_3
轉換爲一個字符串(因爲它是一個BeautifulSoup Navigable字符串數據類型)並進行替換,但我得到了相同的錯誤,除了我的str(param_3)
行。我終於嘗試這個for循環比較得到警告:
UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal
if i == 'â':
我不知所措就在這裏。如何翻譯此字符並將其替換爲param_3
中的其他字符?
任何幫助表示讚賞!提前致謝!