Python - 壓縮ASCII字符串

我正在尋找一種方法來壓縮基於ascii的字符串，任何幫助？Python - 壓縮ASCII字符串

我還需要解壓縮它。我嘗試zlib但沒有任何幫助。

我能做些什麼來將字符串壓縮成較小的長度？

代碼：

def compress(request): 
    if request.POST: 
     data = request.POST.get('input') 
     if is_ascii(data): 
      result = zlib.compress(data) 
      return render_to_response('index.html', {'result': result, 'input':data}, context_instance = RequestContext(request)) 
     else: 
      result = "Error, the string is not ascii-based" 
      return render_to_response('index.html', {'result':result}, context_instance = RequestContext(request)) 
    else: 
     return render_to_response('index.html', {}, context_instance = RequestContext(request))

來源

2012-10-13 moenad

請參閱http://en.literateprograms.org/Huffman_coding_(Python） – GodMan

zlib爲什麼不幫你？ – Sergey

我試圖返回一個字符串，但它沒有奏效。 – moenad

使用壓縮不會總是減少字符串的長度！

請考慮以下代碼;

import zlib 
import bz2 

def comptest(s): 
    print 'original length:', len(s) 
    print 'zlib compressed length:', len(zlib.compress(s)) 
    print 'bz2 compressed length:', len(bz2.compress(s))

讓我們嘗試一個空字符串;

In [15]: comptest('') 
original length: 0 
zlib compressed length: 8 
bz2 compressed length: 14

所以zlib產生一個額外的8個字符，bz2 14壓縮方法通常把一個「標題」中的壓縮數據的前面用於通過解壓縮程序的使用。該標題增加了輸出的長度。

讓我們測試一個單詞;

In [16]: comptest('test') 
original length: 4 
zlib compressed length: 12 
bz2 compressed length: 40

即使您要減去標題的長度，壓縮也沒有使字更短。那是因爲在這種情況下，壓縮很少。字符串中的大部分字符只出現一次。現在短句，

In [17]: comptest('This is a compression test of a short sentence.') 
original length: 47 
zlib compressed length: 52 
bz2 compressed length: 73

再次壓縮輸出比輸入文本較大。由於文本的長度有限，所以文本的重複性很小，所以壓縮效果不好。

您需要一段相當長的文本才能進行壓縮才能正常工作;

In [22]: rings = ''' 
    ....:  Three Rings for the Elven-kings under the sky, 
    ....:  Seven for the Dwarf-lords in their halls of stone, 
    ....:  Nine for Mortal Men doomed to die, 
    ....:  One for the Dark Lord on his dark throne 
    ....:  In the Land of Mordor where the Shadows lie. 
    ....:  One Ring to rule them all, One Ring to find them, 
    ....:  One Ring to bring them all and in the darkness bind them 
    ....:  In the Land of Mordor where the Shadows lie.''' 

In [23]: comptest(rings)      
original length: 410 
zlib compressed length: 205 
bz2 compressed length: 248

來源

2012-10-13 11:39:09

請注意，對於Python 3，「zlib.compress」和「bz2.compress」的輸入必須以字節爲單位，因此您必須先將該字符串編碼（） – mschrimpf

你甚至都不需要你的數據是ascii字符，你可以喂zlib的任何東西

>>> import zlib 
>>> a='aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa' # + any binary data you want 
>>> print zlib.compress(a) 
x�KL$ 
� 
>>>

你可能想在這裏什麼 - 壓縮數據是ASCII字符串？我在這裏嗎？
如果是這樣 - 你應該知道你有非常小的字母表來編碼壓縮數據=>所以你會有更多的符號使用。

例如用base64來編碼二進制數據（你會得到ASCII字符串），但你會用〜多30％的空間爲

來源

2012-10-13 09:44:20 Sergey

Python - 壓縮ASCII字符串

回答

相關問題