2011-07-20 221 views
1

信息我有一個二進制文件和規格:如何讀取二進制文件

 
after 'abst' (0x61627374): 
var1 Unsigned 8-bit integer 
var2 Unsigned 24-bit integer 
var3 Sequence of Unicode 8-bit characters (UTF-8), terminated with 0x00 

如何從文件中讀取VAR1,VAR2,VAR3?

+0

@Artsiom:幾乎無關緊要 –

+0

@約翰 - 刪除 –

回答

1

快速和骯髒的,而不是測試:你有不尋常的位長

# assumption: the file is small enough to fit into the RAM 
# and also that 'abst' does not occur in the dataset 
for hunk in input.split('abst')[1:]: # skip first hunk, since it is the stuff befor the first 'abst' occurence 
    var1 = ord(hunk[0]) 
    var2 = ord(hunk[1]) + ord(hunk[2])*256 + ord(hunk[3])*256*256 
    var3 = hunk[4:].split('\x00')[0] 
+0

VAR3 =猛男[4:]等也 –

+0

如果有廢話之前 'ABST' 你會解包。 –

+0

@John謝謝。 (我甚至被昨天的第一次大塊錯誤* grml *所困住) – Rudi

0

bitstring模塊可能會有所幫助在這裏,它可以比「手」拆包值更可讀一點:

import bitstring 
bitstring.bytealigned = True 
s = bitstring.ConstBitStream(your_file) 
if s.find('0x61627374'): # seeks to your start code 
    start_code, var1, var2 = s.readlist('bytes:4, uint:8, uint:24') 
    p1 = s.pos 
    p2 = s.find('0x00', start=p1) # find next '\x00' 
    var3 = s[p1:p2+8].bytes  # and interpret the slice as bytes