2017-03-11 19 views
0

我目前正在對我的postgresql數據庫運行一個查詢,忽略德語字符 - 元音變音。然而,我不想鬆散這些字符,並且寧願在查詢的輸出中使用德語字符或至少它們的等效字符(例如ä= ae)。運行Python的2.7.12如何使用ASCII字符查詢Unicode數據庫

當我改變編碼對象replacexmlcharrefreplace我收到以下錯誤:

psycopg2.ProgrammingError: syntax error at or near "?" 
LINE 1: ?SELECT 

代碼段:

# -*- coding: utf-8 -*- 

    connection_str = r'postgresql://' + user + ':' + password + '@' + host + '/' + database 

    def query_db(conn, sql): 
     with conn.cursor() as curs: 
      curs.execute(sql) 
      rows = curs.fetchall() 

     print("fetched %s rows from db" % len(rows)) 

     return rows 

    with psycopg2.connect(connection_str) as conn: 
     for filename in files: 
      # Read SQL 
      sql = u"" 

      f = codecs.open(os.path.join(SQL_LOC, filename), "r", "utf-8") 

      for line in f: 
       sql += line.encode('ascii', 'replace').replace('\r\n', ' ') 

      rows = query_db(conn, f) 

我如何傳遞一個查詢作爲帶德文字符的unicode對象? 我也嘗試解碼的查詢作爲utf-8但後來我得到以下錯誤:

UnicodeEncodeError: 'ascii' codec can't encode character u'\xa0' in position 20: ordinal not in range(128) 
+0

我對這個問題有點困惑,我認爲是因爲術語問題。當你說「ASCII字符」時,你是否確實是指「不適合ASCII的字符」? [ASCII](https://en.wikipedia.org/wiki/ASCII)是一種7位編碼,僅涵蓋英文使用的羅馬字母的部分(無重音符號或變音符號)。這聽起來像你在談論你想要的東西。 – Blckknght

回答

0

這裏是爲了獲得其編碼相當於一個解決方案。你將能夠在以後重新編碼和查詢不會產生錯誤:

SELECT convert_from(BYTEA 'foo ᚠ bar'::bytea, 'latin-1'); 
+----------------+ 
| convert_from | 
|----------------| 
| foo á<U+009A>  bar    | 
+----------------+ 
SELECT 1 
Time: 0.011s 
0

你只需要conn.set_client_encoding("utf-8"),然後你可以只執行unicode字符串 - SQL和結果將被編碼和解碼的飛:

$ cat psycopg2-unicode.py 
import sys 
import os 
import psycopg2 
import csv 

with psycopg2.connect("") as conn: 
    conn.set_client_encoding("utf-8") 
    for filename in sys.argv[1:]: 
     file = open(filename, "r", encoding="utf-8") 
     sql = file.read() 
     with conn.cursor() as cursor: 
      cursor.execute(sql) 
      try: 
       rows = cursor.fetchall() 
      except psycopg2.ProgrammingError as err: 
       # No results 
       continue 
      with open(filename+".out", "w", encoding="utf-8", newline="") as outfile: 
       csv.writer(outfile, dialect="excel-tab").writerows(rows) 

$ cat sql0.sql 
create temporary table t(v) as 
    select 'The quick brown fox jumps over the lazy dog.' 
    union all 
    select 'Zwölf große Boxkämpfer jagen Viktor quer über den Sylter Deich.' 
    union all 
    select 'Любя, съешь щипцы, — вздохнёт мэр, — кайф жгуч.' 
    union all 
    select 'Mężny bądź, chroń pułk twój i sześć flag.' 
; 

$ cat sql1.sql 
select * from t; 

$ python3 psycopg2-unicode.py sql0.sql sql1.sql 

$ cat sql1.sql.out 
The quick brown fox jumps over the lazy dog. 
Zwölf große Boxkämpfer jagen Viktor quer über den Sylter Deich. 
Любя, съешь щипцы, — вздохнёт мэр, — кайф жгуч. 
Mężny bądź, chroń pułk twój i sześć flag. 

這項計劃的一個Python2版本更復雜一點,因爲我們需要告訴我們想返回值作爲Unicode對象的驅動程序。另外我用於輸出的csv模塊不支持unicode,所以它需要一個解決方法。這是:

$ cat psycopg2-unicode2.py 
from __future__ import print_function 

import sys 
import os 
import csv 
import codecs 

import psycopg2 
import psycopg2.extensions 
psycopg2.extensions.register_type(psycopg2.extensions.UNICODE) 
psycopg2.extensions.register_type(psycopg2.extensions.UNICODEARRAY) 

with psycopg2.connect("") as conn: 
    conn.set_client_encoding("utf-8") 
    for filename in sys.argv[1:]: 
     file = codecs.open(filename, "r", encoding="utf-8") 
     sql = file.read() 
     with conn.cursor() as cursor: 
      cursor.execute(sql) 
      try: 
       rows = cursor.fetchall() 
      except psycopg2.ProgrammingError as err: 
       # No results from SQL 
       continue 
      with open(filename+".out", "wb") as outfile: 
       for row in rows: 
        row_utf8 = [v.encode('utf-8') for v in row] 
        csv.writer(outfile, dialect="excel-tab").writerow(row_utf8) 
+0

這個解決方案忽略了德語字符,元音變音,一起。所以「Zwölf」這個詞變成了「Zwlf」。這個解決方案是爲python 3構建的,我仍然在運行python 2.7.12。 – OAK

+0

我已經假定Python3爲您使用的打印功能。 我已經將這個例子移植到了Python2中。但是我建議將Python3移植到與Unicode相關的任何程序上 - 這更加安全。 – Tometzky

+0

我的解決方案是'f = codecs.open(os.path.join(SQL_LOC,filename),「r」,「utf-8-sig」)和'f.read()'。關鍵是將SQL文件編碼爲utf-8-sig。 – OAK

相關問題