爲什麼SQLAlchemy與psycopg2 use_native_unicode的性能差？

我很難搞清楚爲什麼一個簡單的SELECT查詢花了很長時間用sqlalchemy使用原始SQL（我得到14600行/秒，但是當通過psycopg2運行相同的查詢而沒有sqlalchemy時， m達到38421行/秒）。爲什麼SQLAlchemy與psycopg2 use_native_unicode的性能差？

經過一番摸索，我意識到在create_engine調用中切換sqlalchemy的use_native_unicode參數實際上會產生巨大的差異。

這個查詢需要0.5secs檢索7300行：

from sqlalchemy import create_engine 

engine = create_engine("postgresql+psycopg2://localhost...", 
         use_native_unicode=True) 
r = engine.execute("SELECT * FROM logtable") 
fetched_results = r.fetchall()

這個查詢需要0.19secs檢索相同的7300行：

engine = create_engine("postgresql+psycopg2://localhost...", 
         use_native_unicode=False) 
r = engine.execute("SELECT * FROM logtable") 
fetched_results = r.fetchall()

2個查詢之間的唯一區別是use_native_unicode。但是sqlalchemy自己的文檔聲明保留use_native_unicode = True（http://docs.sqlalchemy.org/en/latest/dialects/postgresql.html）更好。

有誰知道爲什麼use_native_unicode會造成如此大的性能差異？什麼是關閉use_native_unicode的後果？

來源

2012-11-20 Bob Dover

這個問題是你需要根據你處理的非ASCII數據量來決定的。假設SQLA的C擴展沒有被使用，psycopg2的解碼unicode的方法比SQLAlchemy的解碼方法更快，但是仍然增加了結果集的延遲而不是進行任何類型的unicode轉換。在上面的代碼中，沒有使用SQLAlchemy的unicode工具。這些僅在列被映射到Unicode或字符串類型時纔會使用，只有在使用text（），select（）或ORM級別的對等體時纔會發生這種情況，其中Unicode類型映射到這些結果集列使用Table元數據text（）的「typemap」參數。

Psycopg2的原生Unicode設施OTOH在光標級別生效，因此始終有效，並且顯然會增加總體延遲。

下面是不同方法如何工作的一系列插圖。最後一個是一個最相似的SQLAlchemy的，使用的SQLAlchemy的C擴展時，雖然我們可能只是一個快psycopg2：

import psycopg2 
from psycopg2 import extensions 

conn = psycopg2.connect(user='scott', password='tiger', host='localhost', database='test') 

cursor = conn.cursor() 
cursor.execute(""" 
create table data (
    id SERIAL primary key, 
    data varchar(500) 
) 
""") 

cursor.executemany("insert into data (data) values (%(data)s)", [ 
     {"data":"abcdefghij" * 50} for i in xrange(10000) 
    ]) 
cursor.close() 


def one(conn): 
    cursor = conn.cursor() 
    cursor.execute("SELECT data FROM data") 
    for row in cursor: 
     row[0] 

def two(conn): 
    cursor = conn.cursor() 
    extensions.register_type(extensions.UNICODE, cursor) 
    cursor.execute("SELECT data FROM data") 
    for row in cursor: 
     row[0] 

def three(conn): 
    cursor = conn.cursor() 
    cursor.execute("SELECT data FROM data") 
    for row in cursor: 
     row[0].decode('utf-8') 

def four(conn): 
    cursor = conn.cursor() 
    def conv_unicode(value): 
     return value.decode('utf-8') 
    cursor.execute("SELECT data FROM data") 
    for row in cursor: 
     conv_unicode(row[0]) 

import timeit 

print "no unicode:", timeit.timeit("one(conn)", "from __main__ import conn, one", number=100) 

print "native unicode:", timeit.timeit("two(conn)", "from __main__ import conn, two", number=100) 

print "in Python unicode:", timeit.timeit("three(conn)", "from __main__ import conn, three", number=100) 

print "more like SQLA's unicode:", timeit.timeit("four(conn)", "from __main__ import conn, four", number=100)

時序我得到：

no unicode: 2.10434007645 
native unicode: 4.52875208855 
in Python unicode: 4.77912807465 
more like SQLA's unicode: 4.88325881958

有啥有趣在這裏，SQLA的方法，如果可能我們使用了C擴展，實際上可能是比psycopg2的本地方法更好的選擇，如果實際上你沒有大量使用Unicode類型並且大多數字符串值只是純粹的ASCII。

來源

2012-11-20 21:47:20 zzzeek

令人驚歎的答案，謝謝！出於好奇，你跑什麼樣的機器？當我嘗試了你的測試腳本時，我得到了〜30secs，雖然已經被授予了，但我正在運行一個3gb的ubuntu虛擬機。 –

tl; dr：psycopg2中的unicode處理最近有一些性能改進 - 嘗試2.7版本。

我注意到了和你一樣的事情，併發送了一些時間給@zzzeek。這是他在郵件列表中的回覆。 https://groups.google.com/d/msg/sqlalchemy/TtIel3LTGMY/Ta5oDkNdCwAJ

但是，基本上歸結爲sqlalchemy中的c-extension unicode處理似乎比psycopg2更有效。我通知了psycopg2郵件列表，並打開了一個問題並得到了很好的回覆（https://github.com/psycopg/psycopg2/issues/473）。

來源

2017-04-20 17:34:02 Ugtar

爲什麼SQLAlchemy與psycopg2 use_native_unicode的性能差？

回答

相關問題