Python2字典中的非單調內存消耗

有人可以解釋CPython 2.7中字典的非單調內存使用情況嗎？Python2字典中的非單調內存消耗

>>> import sys 
>>> sys.getsizeof({}) 
280 
>>> sys.getsizeof({'one': 1, 'two': 2, 'three': 3, 'four': 4, 'five': 5}) 
280 
>>> sys.getsizeof({'one': 1, 'two': 2, 'three': 3, 'four': 4, 'five': 5, 'six': 6}) 
1048 
>>> sys.getsizeof({'one': 1, 'two': 2, 'three': 3, 'four': 4, 'five': 5, 'six': 6, 'seven': 7}) 
1048 
>>> sys.getsizeof({'one': 1, 'two': 2, 'three': 3, 'four': 4, 'five': 5, 'six': 6, 'seven': 7, 'e 
ight': 8}) 
664 
>>> sys.getsizeof({'one': 1, 'two': 2, 'three': 3, 'four': 4, 'five': 5, 'six': 6, 'seven': 7, 'e 
ight': 8, 'nine': 9}) 
664

Python3是合理的在這裏，它打印的{'one': 1, 'two': 2, 'three': 3, 'four': 4, 'five': 5, 'six': 6, 'seven': 7}大小爲480

我想這在Ubuntu 15.10和OS X 10.11。

來源

2016-04-30 Dmytro Sirenko

https://github.com/python/cpython/blob/2.7/Objects /dictobject.c –

TLDR：6和7條目字典文字預設哈希表嚴重，然後在調整大小上四倍的大小。

當CPython的2.7評估一個字典文字，它開始在條目填充之前，它使用來創建字典操作碼是BUILD_MAP。這需要一個參數，對於字典有多少條目包含一個提示，which it uses to presize the dict：

TARGET(BUILD_MAP) 
    { 
     x = _PyDict_NewPresized((Py_ssize_t)oparg); 
     PUSH(x); 
     if (x != NULL) DISPATCH(); 
     break; 
    }

這是爲了儘量減少時間創建過程中字典的大小調整的數量，但由於他們沒有考慮的加載因子，它並不完全消除調整大小。

如source code comments所示，_PyDict_NewPresized旨在「創建預先確定大小以容納估計數量的元素的新字典」。創建的字典中的哈希表的確切大小受到許多實現細節的影響，例如最小大小（#define PyDict_MINSIZE 8）以及大小爲2的冪的要求（以避免在實現中需要劃分）。

對於dict文字多達7個條目，_PyDict_NewPresized初始化8項哈希表;對於8個條目，它初始化一個16條目的散列表，因爲它使用的調整大小例程總是選擇比參數大的容量。

Dicts resize on insertion when they become at least 2/3 full.對於6-和7-條目字典的文字，該字典內開始了與8個條目，所以在第六插入發生調整大小。該字典內足夠小，調整大小四倍的哈希表的大小：

return dictresize(mp, (mp->ma_used > 50000 ? 2 : 4) * mp->ma_used);

mp->ma_used是在這一點上在哈希表中，6中使用的條目的數量。 6小於50000，所以我們稱dictresize(mp, 4 * 6)，它將散列表的大小調整爲32個條目，其中2的最小功率大於24。

相比之下，對於8項詞典文字，散列表從16個條目開始。該字典在創建過程中不會變爲2/3，因此最初的16條散列表在字典創建後仍然存在，並且結果字典小於6和7條字典字面值。

的Python 3使用different growth policy，其他字典實現變化中，這就是爲什麼你在Python看到不同的結果3.

來源

2017-02-09 00:22:53 user2357112

嗯，我已經嘗試了一下，讓我們來看看：

dct = {'four': 3, 'three': 2, 'two': 1, 'one': 0} 
print(sys.getsizeof(dct))        # = 272 
print(sys.getsizeof(dict(dct)))      # = 272 
print(sys.getsizeof({k: v for k, v in dct.items()})) # = 272 

dct = {'four': 3, 'three': 2, 'five': 4, 'two': 1, 'one': 0} 
print(sys.getsizeof(dct))        # = 272 
print(sys.getsizeof(dict(dct)))      # = 272 
print(sys.getsizeof({k: v for k, v in dct.items()})) # = 272 

dct = {'six': 5, 'three': 2, 'two': 1, 'four': 3, 'five': 4, 'one': 0} 
print(sys.getsizeof(dct))        # = 1040 
print(sys.getsizeof(dict(dct)))      # = 656 
print(sys.getsizeof({k: v for k, v in dct.items()})) # = 1040 

dct = {'seven': 6, 'six': 5, 'three': 2, 'two': 1, 'four': 3, 'five': 4, 'one': 0} 
print(sys.getsizeof(dct))        # = 1040 
print(sys.getsizeof(dict(dct)))      # = 656 
print(sys.getsizeof({k: v for k, v in dct.items()})) # = 1040 

dct = {'seven': 6, 'six': 5, 'three': 2, 'two': 1, 'four': 3, 'five': 4, 'eight': 7, 'one': 0} 
print(sys.getsizeof(dct))        # = 656 
print(sys.getsizeof(dict(dct)))      # = 1040 
print(sys.getsizeof({k: v for k, v in dct.items()})) # = 1040

我不知道什麼樣的優化是在這裏發生的，但我認爲這是因爲這些結構使用不同的「最佳實踐」。我的意思是何時爲散列表分配多少內存。例如，如果你有十個一個或多個元素，你得到另一個奇怪的差異：

dct = {1: 1, 2: 2, 3: 3, 4: 4, 5: 5, 6: 6, 7: 7, 8: 8, 9: 9, 10:10, 11:11} 
print(sys.getsizeof(dct))        # = 1808 
print(sys.getsizeof(dict(dct)))      # = 1040 
print(sys.getsizeof({k: v for k, v in dct.items()})) # = 1040

所以這可能只是某種內存消耗「優化」創建以不同的方式字典的時候，爲什麼會存在這種不單調異常對於使用6或7個元素時的字面語法：我不知道。也許一些內存優化出了問題，它是一個錯誤，它分配了太多的內存？我還沒有閱讀源代碼（還）。

來源

2016-04-30 13:57:52 MSeifert

Python2字典中的非單調內存消耗

回答

相關問題