2014-09-06 66 views
2

任何想法如何解決這個問題?UnicodeEncodeError:'ascii'編解碼器無法編碼字符u' u2730'在位置1:序號不在範圍內(128)

import csv 
import re 
import time 
import urllib2 
from urlparse import urljoin 
from bs4 import BeautifulSoup 

BASE_URL = 'http://omaha.craigslist.org/sys/' 
URL = 'http://omaha.craigslist.org/sya/' 
FILENAME = '/Users/mona/python/craigstvs.txt' 

opener = urllib2.build_opener() 
opener.addheaders = [('User-agent', 'Mozilla/5.0')] 
soup = BeautifulSoup(opener.open(URL)) 

with open(FILENAME, 'a') as f: 
    writer = csv.writer(f, delimiter=';') 
    for link in soup.find_all('a', class_=re.compile("hdrlnk")): 
     timeset = time.strftime("%m-%d %H:%M") 

     item_url = urljoin(BASE_URL, link['href']) 
     item_soup = BeautifulSoup(opener.open(item_url)) 

     # do smth with the item_soup? or why did you need to follow this link? 

     writer.writerow([timeset, link.text, item_url]) 

回答

0

作爲一個經驗,我不得不說,CSV模塊不支持Unicode完全,但你會發現這種方式非常有用

import codecs 
... 
codecs.open('file.csv', 'r', 'UTF-8') 

打開文件,或者可能要自己處理,而不是使用CSV模塊

0

你只需要encode文本:

link.text.encode("utf-8") 

也可以使用requests代替urllib2:

import requests 
BASE_URL = 'http://omaha.craigslist.org/sys/' 
URL = 'http://omaha.craigslist.org/sya/' 
FILENAME = 'craigstvs.txt' 
soup = BeautifulSoup(requests.get(URL).content) 
with open(FILENAME, 'a') as f: 
    writer = csv.writer(f, delimiter=';') 
    for link in soup.find_all('a', class_=re.compile("hdrlnk")): 
     timeset = time.strftime("%m-%d %H:%M") 
     item_url = urljoin(BASE_URL, link['href']) 
     item_soup = BeautifulSoup(requests.get(item_url).content) 
     # do smth with the item_soup? or why did you need to follow this link? 
     writer.writerow([timeset, link.text.encode("utf-8"), item_url]) 
相關問題