2014-03-26 54 views
0

我使用pygeocoder對地址列表進行地理編碼。這是我的代碼:csv模塊和pygeocoder錯誤

import csv 
import pandas as pd 
from pygeocoder import Geocoder 
from pygeocoder import GeocoderError 

df = pd.read_csv('C:\Users\L\Desktop\germanfdiaddress.csv', encoding="iso-8859-1") 

address = df.Address 
print address 
add=[] 
lat=[] 
lng=[] 
pcode=[] 

for a in address: 
    try: 
     result = Geocoder.geocode(a) 
     lat.extend([result[0].coordinates[0]]) 
     lng.extend([result[0].coordinates[1]]) 
     pcode.extend([result[0].postal_code]) 
    except GeocoderError: 
     continue 
    result = Geocoder.geocode(a) 
    lat.extend([result[0].coordinates[0]]) 
    lng.extend([result[0].coordinates[1]]) 
    pcode.extend([result[0].postal_code]) 

fields= 'add','lat', 'lng', 'pcode' 
rows=zip(address,lat,lng,pcode) 

with open('C:\Users\L\Desktop\myfile.csv', 'wb') as outfile: 
    w = csv.writer(outfile) 
    w.writerow(fields) 
    for i in rows: 
     w.writerow(i) 

不過,我收到以下錯誤:

Traceback (most recent call last): 
    File "C:\Users\Jesus\Dropbox\coding\python\geocoder with uft-8, with complete output.py", line 42, in <module> 
    w.writerow(i) 
UnicodeEncodeError: 'ascii' codec can't encode character u'\xfc' in position 13: ordinal not in range(128) 

上發生了什麼任何想法?我知道我的代碼除了寫入csv文件外都可以工作。

下面是CSV文件:https://www.dropbox.com/s/6yprg2u1ghuygye/germanfdiaddress.csv

+0

您需要使用'codecs.open'並設置要寫入文件的編碼。 –

回答

0

csv模塊與編碼非ASCII是有據可查的問題:

This version of the csv module doesn’t support Unicode input. Also, there are currently some issues regarding ASCII NUL characters. Accordingly, all input should be UTF-8 or printable ASCII to be safe;

當你正在做簡單的讀取和寫入操作,您可以使用示例UnicodeWriterdocumentation開始的類。

或者,你可以這樣簡化代碼:

import codecs 

# ... 

with codecs.open(r'C:\Users\L\Desktop\myfile.csv', 
       mode='w', encoding='utf-8') as outfile: 

    outfile.write('{}\n'.format(','.join(fields))) 
    for i in rows: 
     outfile.write('{}\n'.format(','.join(i))) 

當您使用\作爲路徑分隔符,請使用原始字符串r'C:\Users\L\Desktop\myfile.csv'。這是爲了防止像'C:\newfile這樣的錯誤解釋。

您還可以使用正斜槓(即使在Windows中),這將消除使用原始字符串的需要。

或者,您可以使用os.path.join來構建文件路徑。

要點是,避免使用\

+0

感謝您的幫助。我結束了使用不同的方法,但總是很好,有更多的選擇。 – asado23

1

所以我只是改變了unicodecsv的csv模塊,它完美的工作。下面是新的代碼:

import unicodecsv 
import pandas as pd 
from pygeocoder import Geocoder 
from pygeocoder import GeocoderError 

df = pd.read_csv('C:\Users\L\Desktop\germanfdiaddress.csv', encoding="iso-8859-1") 

address = df.Address 
print address 
add=[] 
lat=[] 
lng=[] 
pcode=[] 

for a in address: 
    try: 
     result = Geocoder.geocode(a) 
     lat.extend([result[0].coordinates[0]]) 
     lng.extend([result[0].coordinates[1]]) 
     pcode.extend([result[0].postal_code]) 
    except GeocoderError: 
     continue 


fields= 'add','lat', 'lng', 'pcode' 
rows=zip(address,lat,lng,pcode) 

with open('C:\Users\L\Desktop\myfile.csv', 'wb') as outfile: 
    w = unicodecsv.writer(outfile, encoding='iso-8859-1') 
    w.writerow(fields) 
    for i in rows: 
     w.writerow(i) 
0

爲了有一個更清潔Python的樣子,你可以在GitHub上& PyPI中,而不是pygeocoder使用Geocoder,還應對Unicode的問題UnicodeCSV真是太神奇了,你可以保持相同的外觀在DictWriter & DictReader的感覺,這裏有一個代碼示例:

import geocoder 
import unicodecsv 
import logging 

# CSV Writer 
csvfile = open('address_out.csv', 'wb') 
fieldnames = ['source', 'address', 'lat', 'lng', 'postal'] 
writer = unicodecsv.DictWriter(csvfile, fieldnames=fieldnames, encoding='utf-8') 
writer.writeheader() 

# CSV Reader 
with open('address.csv', 'rb') as f: 
    reader = unicodecsv.DictReader(f, encoding='iso-8859-1') 
    for line in reader: 
     address = line['Address'] 

     # Geocoding 
     g = geocoder.google(address) 
     if g.ok: 
      row = {} 
      row['source'] = address 
      row['address'] = g.address 
      row['lat'] = g.lat 
      row['lng'] = g.lng 
      row['postal'] = g.postal 
      writer.writerow(row) 
      logging.info('Geocoding SUCCESS: ' + address) 
     else: 
      logging.warning('Geocoding ERROR: ' + address)