2015-06-27 106 views
0

我想從PGA網站上刮取數據以獲得美國所有高爾夫球場的列表。我想抓取數據並輸入到CSV文件中。我的問題是運行我的腳本後,我得到這個錯誤。任何人都可以幫助解決這個錯誤,以及我如何能夠提取數據?UnicodeEncodeError:使用Python和beautifulsoup4刮取數據

以下是錯誤消息:

File "/Users/AGB/Final_PGA2.py", line 44, in
writer.writerow(row)

UnicodeEncodeError: 'ascii' codec can't encode character u'\u201c' in position 35: ordinal not in range(128)

腳本下面;

import csv 
import requests 
from bs4 import BeautifulSoup 

courses_list = [] 
for i in range(906):  # Number of pages plus one 
    url = "http://www.pga.com/golf-courses/search?page={}&searchbox=Course+Name&searchbox_zip=ZIP&distance=50&price_range=0&course_type=both&has_events=0".format(i) 
r = requests.get(url) 
soup = BeautifulSoup(r.content) 

g_data2=soup.find_all("div",{"class":"views-field-nothing"}) 

for item in g_data2: 
    try: 
      name = item.contents[1].find_all("div",{"class":"views-field-title"})[0].text 
      print name 
    except: 
      name='' 
    try: 
      address1=item.contents[1].find_all("div",{"class":"views-field-address"})[0].text 
    except: 
      address1='' 
    try: 
      address2=item.contents[1].find_all("div",{"class":"views-field-city-state-zip"})[0].text 
    except: 
      address2='' 
    try: 
      website=item.contents[1].find_all("div",{"class":"views-field-website"})[0].text 
    except: 
      website='' 
    try: 
      Phonenumber=item.contents[1].find_all("div",{"class":"views-field-work-phone"})[0].text 
    except: 
      Phonenumber=''  

    course=[name,address1,address2,website,Phonenumber] 

    courses_list.append(course) 


with open ('PGA_Final.csv','a') as file: 
      writer=csv.writer(file) 
      for row in courses_list: 
       writer.writerow(row) 
+0

您能編輯您的文章以正確顯示嗎?如果您將整個東西縮進4個空格,它將顯示爲代碼塊而不是未格式化的文本。 –

+0

我編輯了帖子,等待批准。 – Leb

+0

http://stackoverflow.com/questions/30551429/error-writing-data-to-csv-due-to-ascii-error-in-python/30551550#30551550 –

回答

0
with open ('PGA_Final.csv','a') as file: 
      writer=csv.writer(file) 
      for row in courses_list: 
       writer.writerow(row) 

修改成:

with open ('PGA_Final.csv','a') as file: 
      writer=csv.writer(file) 
      for row in courses_list: 
       writer.writerow(row.encode('utf-8')) 

或者:

import codecs 
.... 
with codecs.open('PGA_Final.csv','a', encoding='utf-8') as file: 
      writer=csv.writer(file) 
      for row in courses_list: 
       writer.writerow(row) 
+0

你也可以使用['codecs.open'](https://docs.python.org/2/library/codecs。 html#codecs.open),它像普通的'open'一樣工作,但也接受'encoding' kwarg。 –

+0

我在您的建議中增加了另一個解決方案。 – Leb

+1

AttributeError:'list'對象沒有第一個選項的'encode'屬性 – Gonzalo68

1

你不應該在Python 3下得到的錯誤以下是修復了一些無關的問題的代碼示例您碼。它分析給定網頁上的指定字段,並將它們保存爲csv:

#!/usr/bin/env python3 
import csv 
from urllib.request import urlopen 
import bs4 # $ pip install beautifulsoup4 

page = 905 
url = ("http://www.pga.com/golf-courses/search?page=" + str(page) + 
     "&searchbox=Course+Name&searchbox_zip=ZIP&distance=50&price_range=0" 
     "&course_type=both&has_events=0") 
with urlopen(url) as response: 
    field_content = bs4.SoupStrainer('div', 'views-field-nothing') 
    soup = bs4.BeautifulSoup(response, parse_only=field_content) 

fields = [bs4.SoupStrainer('div', 'views-field-' + suffix) 
      for suffix in ['title', 'address', 'city-state-zip', 'website', 'work-phone']] 

def get_text(tag, default=''): 
    return tag.get_text().strip() if tag is not None else default 

with open('pga.csv', 'w', newline='') as output_file: 
    writer = csv.writer(output_file) 
    for div in soup.find_all(field_content): 
     writer.writerow([get_text(div.find(field)) for field in fields])