2014-04-27 40 views
5

我想修改布蘭登羅德斯代碼Routines that examine the internals of a CPython dictionary,以便它適用於CPython 3.3。python 3.3 dict:如何將struct PyDictKeysObject轉換爲python類?

我相信我已經成功地翻譯了這個結構。

typedef PyDictKeyEntry *(*dict_lookup_func) 
    (PyDictObject *mp, PyObject *key, Py_hash_t hash, PyObject ***value_addr); 

struct _dictkeysobject { 
    Py_ssize_t dk_refcnt; 
    Py_ssize_t dk_size; 
    dict_lookup_func dk_lookup; 
    Py_ssize_t dk_usable; 
    PyDictKeyEntry dk_entries[1]; 
}; 

我認爲有以下現在看起來不錯:

from ctypes import Structure, c_ulong, POINTER, cast, py_object, CFUNCTYPE 

LOOKUPFUNC = CFUNCTYPE(POINTER(PyDictKeyEntry), POINTER(PyDictObject), 
         py_object, c_ulong, POINTER(POINTER(py_object))) 

class PyDictKeysObject(Structure): 
"""A key object""" 
_fields_ = [ 
    ('dk_refcnt', c_ssize_t), 
    ('dk_size', c_ssize_t), 
    ('dk_lookup', LOOKUPFUNC), 
    ('dk_usable', c_ssize_t), 
    ('dk_entries', PyDictKeyEntry * 1), 
] 

PyDictKeysObject._dk_entries = PyDictKeysObject.dk_entries 
PyDictKeysObject.dk_entries = property(lambda s: 
    cast(s._dk_entries, POINTER(PyDictKeyEntry * s.dk_size))[0]) 

這行代碼現在工作,其中d == {0: 0, 1: 1, 2: 2, 3: 3}

obj = cast(id(d), POINTER(PyDictObject)).contents # works!!` 

這裏是我的翻譯從C struct PyDictObject:

class PyDictObject(Structure): # an incomplete type 
    """A dictionary object.""" 

def __len__(self): 
    """Return the number of dictionary entry slots.""" 
    pass 

def slot_of(self, key): 
    """Find and return the slot at which `key` is stored.""" 
    pass 

def slot_map(self): 
    """Return a mapping of keys to their integer slot numbers.""" 
    pass 

PyDictObject._fields_ = [ 
    ('ob_refcnt', c_ssize_t), 
    ('ob_type', c_void_p), 
    ('ma_used', c_ssize_t), 
    ('ma_keys', POINTER(PyDictKeysObject)), 
    ('ma_values', POINTER(py_object)), # points to array of ptrs 
] 
+0

注意:您可以直接鏈接到[hg.python.org](http://hg.python.org/cpython/file/3.3/Objects/dictobject.c#l72)。嘗試'ctypes.CFUNCTYPE'來定義'dict_lookup_func'。 – jfs

+0

更新:我現在已經使用CFUNCTYPE聲明瞭dk_lookup的類型: – LeslieK

+0

@ J.F.Sebastian:謝謝。我現在用CFUNCTYPE聲明瞭dk_lookup的類型。 dk_entries看起來不錯嗎? C代碼使用dk_entries [1]。 – LeslieK

回答

3

我的問題是訪問在Cpython 3.3中實現的Python字典底層的C結構。我從cpython/Objects/dictobject.c和Include/dictobject.h中提供的C結構開始。三個C結構參與定義字典:PyDictObject,PyDictKeysObject和PyDictKeyEntry。將每個C結構正確轉換爲python如下所示。評論表明我需要修正的地方。感謝@eryksun指導我一路走來!

class PyDictKeyEntry(Structure): 
"""An entry in a dictionary.""" 
    _fields_ = [ 
     ('me_hash', c_ulong), 
     ('me_key', py_object), 
     ('me_value', py_object), 
    ] 

class PyDictObject(Structure): 
    """A dictionary object.""" 
    pass 

LOOKUPFUNC = CFUNCTYPE(POINTER(PyDictKeyEntry), POINTER(PyDictObject), py_object, c_ulong, POINTER(POINTER(py_object))) 

class PyDictKeysObject(Structure): 
"""An object of key entries.""" 
    _fields_ = [ 
     ('dk_refcnt', c_ssize_t), 
     ('dk_size', c_ssize_t), 
     ('dk_lookup', LOOKUPFUNC), # a function prototype per docs 
     ('dk_usable', c_ssize_t), 
     ('dk_entries', PyDictKeyEntry * 1), # an array of size 1; size grows as keys are inserted into dictionary; this variable-sized field was the trickiest part to translate into python 
    ] 

PyDictObject._fields_ = [ 
    ('ob_refcnt', c_ssize_t), # Py_ssize_t translates to c_ssize_t per ctypes docs 
    ('ob_type', c_void_p),  # could not find this in the docs 
    ('ma_used', c_ssize_t), 
    ('ma_keys', POINTER(PyDictKeysObject)), 
    ('ma_values', POINTER(py_object)), # Py_Object* translates to py_object per ctypes docs 
] 

PyDictKeysObject._dk_entries = PyDictKeysObject.dk_entries 
PyDictKeysObject.dk_entries = property(lambda s: cast(s._dk_entries, POINTER(PyDictKeyEntry * s.dk_size))[0]) # this line is called every time the attribute dk_entries is accessed by a PyDictKeyEntry instance; it returns an array of size dk_size starting at address _dk_entries. (POINTER creates a pointer to the entire array; the pointer is dereferenced (using [0]) to return the entire array); the code then accesses the ith element of the array) 

下面的函數提供了訪問PyDictObject底層蟒詞典:

def dictobject(d): 
    """Return the PyDictObject lying behind the Python dict `d`.""" 
    if not isinstance(d, dict): 
     raise TypeError('cannot create a dictobject from %r' % (d,)) 
    return cast(id(d), POINTER(PyDictObject)).contents 

如果d是與鍵 - 值對Python字典,然後obj是包含鍵 - 的PyDictObject實例值對:

obj = cast(id(d), POINTER(PyDictObject)).contents 

的PyDictKeysObject的實例是:

key_obj = obj.ma_keys.contents 

一個指向存儲在字典的時隙0的關鍵是:

key_obj.dk_entries[0].me_key 

以該探針插入到字典中的每個密鑰的散列值衝突例程使用這些類,一起程序,位於here。我的代碼是由Brandon Rhodes爲python 2.x編寫的代碼的修改。他的代碼是here

+0

@eryksun我知道我被「指向數組的指針,整個數組」所困惑。當我回來研究我的困惑時,我看到了你的評論,並非常感謝。所以當我們訪問屬性dk_entries時,我們將返回整個數組。爲什麼定義的結構包含整個數組? [Eli Bendersky評論](http://eli.thegreenplace.net/2010/01/11/pointers-to-arrays-in-c/):真的,我無法想象爲什麼要使用指向數組的指針在真實生活中。 – LeslieK

+0

Eli的例子錯過了使用'int(* p)[4]'或者'int p [] [4]'作爲參數的常見情況。他只查看'(* p)[2] = 10',但C在這裏知道'* p'是一個由4個int值組成的數組,所以'p [1]'增加了4 * sizeof(int) '到基地址。所以現在我們可以直觀地把它作爲一個n×4的二維數組來處理,例如'p [1] [2] = 10'。 C99甚至可以讓我們傳遞列數作爲參數,例如'void test(size_t n,size_t m,int p [] [m])''。 – eryksun

+0

*「爲什麼要將結構定義爲包含整個數組?」*我沒有深入研究這一點。我想這可能會提高小型字典的性能;它在一個連續的塊中的一次調用中被分配,改善了緩存局部性。但是你真的不得不要求更熟悉設計的人...... – eryksun

相關問題