2013-09-01 20 views
7

這可能是微不足道的,但我不確定我的理解,我試着用Google搜索,但沒有找到令人信服的答案。爲什麼Python中的空字典的大小與非空字典的大小相同?

>>> sys.getsizeof({}) 
140 
>>> sys.getsizeof({'Hello':'World'}) 
140 
>>> 
>>> yet_another_dict = {} 
>>> for i in xrange(5000): 
     yet_another_dict[i] = i**2 

>>> 
>>> sys.getsizeof(yet_another_dict) 
98444 

我該如何理解? 爲什麼空字典與非空字典相同?

+1

必須觀看錄像帶上的錄像:[The mighty dictionary](http://blip.tv/pycon-us-videos-2009-2010-2011/pycon-2010-the-mighty-dictionary-55-3352147) –

回答

9

有兩個原因:

  1. 字典僅持有,對對象的引用而不是對象本身,所以它的大小是沒有用的物體的大小,它包含相關,但由字典包含的引用(項目)的數量。

  2. 更重要的是,詞典預先分配了塊的引用的內存。所以當你創建一個字典時,它已經預先分配了第一個n引用的內存。當它填滿內存時,它會預先分配一個新塊。

你可以觀察到行爲,運行代碼的下一個和平。

d = {} 
size = sys.getsizeof(d) 
print size 
i = 0 
j = 0 
while i < 3: 
    d[j] = j 
    j += 1 
    new_size = sys.getsizeof(d) 
    if size != new_size: 
     print new_size 
     size = new_size 
     i += 1 

打印出:

280 
1048 
3352 
12568 

在我的機器,但是這取決於系統的架構(32位,64位)。

7

CPython中的字典直接在字典對象本身中分配少量密鑰空間(4-8個條目取決於版本和編譯選項)。從dictobject.h

/* PyDict_MINSIZE is the minimum size of a dictionary. This many slots are 
* allocated directly in the dict object (in the ma_smalltable member). 
* It must be a power of 2, and at least 4. 8 allows dicts with no more 
* than 5 active entries to live in ma_smalltable (and so avoid an 
* additional malloc); instrumentation suggested this suffices for the 
* majority of dicts (consisting mostly of usually-small instance dicts and 
* usually-small dicts created to pass keyword arguments). 
*/ 
#ifndef Py_LIMITED_API 
#define PyDict_MINSIZE 8 

注意CPython的也調整大小批量的字典,以避免頻繁的重新分配種植字典。從dictobject.c

/* If we added a key, we can safely resize. Otherwise just return! 
* If fill >= 2/3 size, adjust size. Normally, this doubles or 
* quaduples the size, but it's also possible for the dict to shrink 
* (if ma_fill is much larger than ma_used, meaning a lot of dict 
* keys have been * deleted). 
* 
* Quadrupling the size improves average dictionary sparseness 
* (reducing collisions) at the cost of some memory and iteration 
* speed (which loops over every possible entry). It also halves 
* the number of expensive resize operations in a growing dictionary. 
* 
* Very large dictionaries (over 50K items) use doubling instead. 
* This may help applications with severe memory constraints. 
*/ 
if (!(mp->ma_used > n_used && mp->ma_fill*3 >= (mp->ma_mask+1)*2)) 
    return 0; 
return dictresize(mp, (mp->ma_used > 50000 ? 2 : 4) * mp->ma_used);