使用壓縮不會總是減少字符串的長度!
請考慮以下代碼;
import zlib
import bz2
def comptest(s):
print 'original length:', len(s)
print 'zlib compressed length:', len(zlib.compress(s))
print 'bz2 compressed length:', len(bz2.compress(s))
讓我們嘗試一個空字符串;
In [15]: comptest('')
original length: 0
zlib compressed length: 8
bz2 compressed length: 14
所以zlib
產生一個額外的8個字符,bz2
14壓縮方法通常把一個「標題」中的壓縮數據的前面用於通過解壓縮程序的使用。該標題增加了輸出的長度。
讓我們測試一個單詞;
In [16]: comptest('test')
original length: 4
zlib compressed length: 12
bz2 compressed length: 40
即使您要減去標題的長度,壓縮也沒有使字更短。那是因爲在這種情況下,壓縮很少。字符串中的大部分字符只出現一次。現在短句,
In [17]: comptest('This is a compression test of a short sentence.')
original length: 47
zlib compressed length: 52
bz2 compressed length: 73
再次壓縮輸出比輸入文本較大。由於文本的長度有限,所以文本的重複性很小,所以壓縮效果不好。
您需要一段相當長的文本才能進行壓縮才能正常工作;
In [22]: rings = '''
....: Three Rings for the Elven-kings under the sky,
....: Seven for the Dwarf-lords in their halls of stone,
....: Nine for Mortal Men doomed to die,
....: One for the Dark Lord on his dark throne
....: In the Land of Mordor where the Shadows lie.
....: One Ring to rule them all, One Ring to find them,
....: One Ring to bring them all and in the darkness bind them
....: In the Land of Mordor where the Shadows lie.'''
In [23]: comptest(rings)
original length: 410
zlib compressed length: 205
bz2 compressed length: 248
請參閱http://en.literateprograms.org/Huffman_coding_(Python) – GodMan
zlib爲什麼不幫你? – Sergey
我試圖返回一個字符串,但它沒有奏效。 – moenad