2016-01-22 68 views
1

我想用python實現Lempel-Ziv-Welch算法,但是在用二進制文件編寫我的文件時遇到困難。讀取和寫入逐字節的壓縮

action = sys.argv[3] 
if action == "compress": 
# initialize dictionary 
dictionary = {} 
for i in range(0,256): 
    # for single characters, the value is the same as the key 
    # in the compressed file, these would appear as is 
    dictionary[chr(i)] = i 
input_file = open(sys.argv[1], 'rb+') 
output_file = open(sys.argv[2], 'wb') 

data = input_file.read() 
# current_data is one byte 
current_data = input_file.read(1) 
i = 0 
j = 1 
current_data = data[i:j] 
# look for the shortest string not in the dictionary 
while i < len(data) - 2: 
    while current_data in dictionary.keys(): 
     if j < len(data) + 1: 
      j = j + 1 
      current_data = data[i:j] 
     else: 
      break 
    # once the shortest string is found, add it to the dictionary 
    if current_data not in dictionary.keys(): 
     dictionary[current_data] = len(dictionary) 
     thing_to_write = dictionary[current_data[:-1]] 
     i = j - 1 
     current_data = data[i:j] 
    else: 
     thing_to_write = dictionary[current_data] 
     i = i + 1 
     j = i + 1 
    # then write to the output file the found string - one character from the end (the longest string that is in the dictionary)\ 
    mylist = [] 
    thing_to_write = format(thing_to_write,'x') 
    thing_to_write = thing_to_write 
    for char in thing_to_write: 
     mylist.append(char.encode('hex')) 
     for elem in mylist: 
      output_file.write(elem) 
input_file.close() 
output_file.close() 
print >> sys.stderr, "The size of " + sys.argv[1] + " is " + str(os.path.getsize(sys.argv[1])) + " bytes." + "\n" + "The size of " + sys.argv[2] + " is " + str(os.path.getsize(sys.argv[2])) + " bytes." 

我試過用十六進制,二進制等格式寫很多不同的格式,但我想我只是把它們寫成8位字符。我怎樣才能寫入原始二進制文件?

+0

什麼意思 「我有麻煩」?你有錯誤信息嗎?然後將完整的信息添加到問題。 – furas

+0

[如何創建最小,完整和可驗證示例](http://stackoverflow.com/help/mcve) – wwii

回答

0

目前尚不清楚你在寫什麼。你得到的數據最終可能會超過256個,所以我假設你想要將2個字節的無符號整數寫入輸出文件?

如果是這種情況,那麼我建議您研究Python的struct.pack函數,該函數旨在將數據從Python類型轉換爲二進制表示。如果你的數據是字節大小的,你可以用output_file.write(chr(x))來寫每個字符。

下使用Python的struct.pack()

import os 
os.chdir(os.path.dirname(os.path.abspath(__file__))) 

import sys 
import struct 

action = sys.argv[3] 

if action == "compress": 
    # initialize dictionary 
    dictionary = {} 

for i in range(0,256): 
    # for single characters, the value is the same as the key 
    # in the compressed file, these would appear as is 
    dictionary[chr(i)] = i 

input_file = open(sys.argv[1], 'rb') 
output_file = open(sys.argv[2], 'wb') 

data = input_file.read() 

# current_data is one byte 
current_data = input_file.read(1) 
i = 0 
j = 1 
current_data = data[i:j] 

# look for the shortest string not in the dictionary 

while i < len(data) - 2: 
    while current_data in dictionary.keys(): 
     if j < len(data) + 1: 
      j = j + 1 
      current_data = data[i:j] 
     else: 
      break 

    # once the shortest string is found, add it to the dictionary 
    if current_data not in dictionary.keys(): 
     dictionary[current_data] = len(dictionary) 
     thing_to_write = dictionary[current_data[:-1]] 
     i = j - 1 
     current_data = data[i:j] 
    else: 
     thing_to_write = dictionary[current_data] 
     i = i + 1 
     j = i + 1 

    # then write to the output file the found string - one character from the end (the longest string that is in the dictionary)\ 
    output_file.write(struct.pack('H', thing_to_write))  # Convert each thing into 2 byte binary 

input_file.close() 
output_file.close() 

print >> sys.stderr, "The size of " + sys.argv[1] + " is " + str(os.path.getsize(sys.argv[1])) + " bytes." + "\n" + "The size of " + sys.argv[2] + " is " + str(os.path.getsize(sys.argv[2])) + " bytes."