2013-02-04 107 views
0

我正在LZ77壓縮程序中工作,當我嘗試壓縮116Kb文件時,處理時間太長。我的代碼有什麼問題嗎?我該如何能夠改進算法的處理時間?LZ77壓縮算法緩慢處理

import fileinput 

class Assign: 

    def pattern(self, data): 

     self.skip = [] 

     self.m = len(data) 
     for k in range(256): self.skip.append(self.m) 

     for k in range(self.m - 1): self.skip[ord(data[k])] = self.m - k - 1 

     self.skip = tuple(self.skip) 

     self.data = data 
    def find(self, text): 
     n = len(text) 
     if self.m > n: return -1 
     k = self.m - 1 
     while k < n: 
      j = self.m - 1; i = k 
      while j >= 0 and text[i] == self.data[j]: 
       j -= 1; i -= 1 
      if j == -1: return i + 1 
      k += self.skip[ord(text[k])] 
     return -1 

class LZ77: 

    def __init__(self, data): 
     self.position = 0 
     self.window = "" 
     self.stream = data 
     self.streamSize = len(self.stream) 
     self.search = Assign() 
    def Encode(self): 
     p = 0 
     c = '' 
     lastresult = 0 
     found = 0 
     for i in range(self.streamSize): 
      self.search.pattern(self.stream[self.position:self.position+i+1]) 
      result = self.search.find(self.window) 
      if result < 0: break 
      lastresult = result 
      found = 1 
     c = self.stream[self.position+i] 
     p = lastresult 
     B = 0 
     if i > 0: B = self.position - p 
     L = i 
     if self.streamSize > 0: 
      self.position += i + 1 
      self.streamSize -= i + 1 
      self.window = self.stream[:self.position] 
     #print B,L,c 
     return ((B, L), c) 



    def Encoder(self): 

     output = "" 

     length = self.streamSize 
     while self.streamSize > 0: 
      ((B, L), C) = self.Encode() 
      output += str(B) + str(L) + C 
     return (output) 

def aiyoo(#filename): 

    enter = raw_input("enter the filename to which the original file is to be compressed to") 
    enter1 = enter 
    fob1 = open(enter,'wb') 
    original = '' 
    print filename 
    #fob = open(filename,'rb') 
    #numlines = 0 
    """while True: 
     c = fob.read(1) 
     if not c: 
      print "End of file" 
      break 
     else :""" 
    for line in fileinput.input(['hello.txt']): #size of hello.txt is 116kb 
     original = line 
     lz = LZ77(original) 
     stream = lz.Encoder()#, streamSize = lz.Encoder() 
     fob1.write(stream) 
    """for i in fob: 
     original = i 
     lz = LZ77(original) 
     stream = lz.Encoder()#, streamSize = lz.Encoder() 
     fob1.write(stream)""" 
    #fob.close() 
    fob1.close() 
    return enter 

回答

0

在問這樣的問題之前,您應該使用profile您的代碼來找出哪個部分花費最多的時間。

但無論如何,我看到你已經實現了一個子串搜索功能。改用find string method應該會顯着加速。

還要注意,在Python中實現的壓縮算法不會像在Python中那樣快。 C,甚至不接近。

0

lz77在真實存檔器(如zip和rar)中的工作方式如下:首先將所有具有相同前3-4個字節的位置插入同一個哈希桶中,然後我們只在這些條目中搜索最長匹配,通常限制司機8-32職位