2014-11-21 162 views
3

我想使用copy_from命令(函數來利用postgres中的複製命令)將數據行加載到類似csv結構的postgres中。我的數據以逗號分隔(不幸的是,由於我不是數據擁有者,我不能只更改分隔符)。當我嘗試加載包含逗號的引號中的值的行時(即,不應將逗號視爲分隔符),我遇到了問題。Psycopg2「copy_from」命令,可能忽略引號中的分隔符(出現錯誤)?

例如這個行數據的是精細:

",Madrid,SN,,SEN,,,SN,173,157" 

數據的這一列是不精:

","Dominican, Republic of",MC,,YUO,,,MC,65,162", 

某些代碼:

conn = get_psycopg_conn() 
    cur = conn.cursor() 

    _io_buffer.seek(0) #This buffer is holding the csv-like data 
    cur.copy_from(_io_buffer, str(table_name), sep=',', null='', columns=column_names) 
    conn.commit() 
+0

什麼是copy_from功能是什麼樣子? – hd1 2014-11-21 06:46:51

+0

@hdr http://initd.org/psycopg/docs/cursor.html#cursor.copy_from – 2014-11-21 07:38:23

回答

6

It looks like copy_from doesn't expose the csv mode or quote options,其are available form the underlying PostgreSQL COPY command。所以你需要補丁psycopg2來添加它們,或者use copy_expert

我還沒有嘗試過,但像

curs.copy_expert("""COPY mytable FROM STDIN WITH (FORMAT CSV)""", _io_buffer) 

可能就足夠了。

+1

感謝Craig這回答了我的問題。不幸的是,數據在一路上被拋出亂序,所以除非我解決這個問題,否則不會有機會實現這一點。更進一步! – wouldbesmooth 2014-11-24 14:38:57

+1

這爲我修好了。謝謝。令人傷心的是,Psycopg2沒有內置這些​​選項。 – sudo 2016-01-03 00:47:36

+2

@sudo好吧,只有有人寫一個補丁才能實現它,這隻會令人傷心。修改psycopg2並不太困難。 – 2016-01-03 03:06:59

0

我有這個相同的錯誤,並能夠基於craig-ringer列出的單行代碼接近修復。我需要的另一個項目是使用df.to_csv(index=False,header=False, quoting=csv.QUOTE_NONNUMERIC,sep=','),特別是, quoting=csv.QUOTE_NONNUMERIC包含最初對象的引號。

Postgres裏從MySQL拉一個數據源,並保存它的完整例子如下:

#run in python 3.6 
import MySQLdb 
import psycopg2 
import os 
from io import StringIO 
import pandas as pd 
import csv 

mysql_db = MySQLdb.connect(host="host_address",# your host, usually localhost 
        user="user_name",   # your username 
        passwd="source_pw", # your password 
        db="source_db")  # name of the data base 

postgres_db = psycopg2.connect("host=dest_address dbname=dest_db_name user=dest_user password=dest_pw") 

my_list = ['1','2','3','4'] 

# you must create a Cursor object. It will let you execute all the queries you need 
mysql_cur = mysql_db.cursor() 
postgres_cur = postgres_db.cursor() 

for item in my_list: 
    # Pull cbi data for each state and write it to postgres 
    print(item) 
    mysql_sql = 'select * from my_table t \ 
     where t.important_feature = \'' + item + '\';' 

    # Do something to create your dataframe here... 
    df = pd.read_sql_query(mysql_sql, mysql_db) 

    # Initialize a string buffer 
    sio = StringIO() 
    sio.write(df.to_csv(index=False,header=False, quoting=csv.QUOTE_NONNUMERIC,sep=',')) # Write the Pandas DataFrame as a csv to the buffer 
    sio.seek(0) # Be sure to reset the position to the start of the stream 

    # Copy the string buffer to the database, as if it were an actual file 
    with postgres_db.cursor() as c: 
     print(c) 
     c.copy_expert("""COPY schema:new_table FROM STDIN WITH (FORMAT CSV)""", sio) 
     postgres_db.commit() 

mysql_db.close() 
postgres_db.close() 
相關問題