如何搜索二進制數據中的唯一序列？

我正在嘗試讀取頭文件的二進制文件。我知道某些信息是在一個唯一的序列02 06 08 22 02 02 08 00後保存的。我怎麼能找到這種獨特序列的位置？如何搜索二進制數據中的唯一序列？

我可以使用

字符串StreamReadAsText（ScriptObject流，數字編碼，數計）

讀取二進制文件一個接一個。但我想這很愚蠢和緩慢。

此外，當輸出不是實際文本（Ascii表中的00和1F之間）時，如何比較StreamReadAsText（）的結果？

然後，我怎麼看。比如二進制文件作爲INT8（大小相同字符串中的字符），閱讀02，然後06，然後08等等

任何幫助是值得歡迎並讚賞。

問候，

羅傑

來源

2017-06-15 Roger

相關（但不重複）：https://stackoverflow.com/q/34834197/1302888 – BmyGuest

您已經在正確的軌道上用流式命令讀取文件。然而，你爲什麼要讀取文本流？您可以使用tagGroup對象作爲TagGroupReadTagDataFromStream()的代理將該流讀取爲任何（支持）編號。

F1幫助部分中實際上有一個例子，其中列出了流式命令，我只是在這裏複製。

Object stream = NewStreamFromBuffer(NewMemoryBuffer(256)) 
TagGroup tg = NewTagGroup(); 

Number stream_byte_order = 1; // 1 == bigendian, 2 == littleendian 
Number v_uint32_0, v_uint32_1, v_sint32_0, v_uint16_0, v_uint16_1 

// Create the tags and initialize with default values 
tg.TagGroupSetTagAsUInt32("UInt32_0", 0) 
tg.TagGroupSetTagAsUInt32("UInt32_1", 0) 
tg.TagGroupSetTagAsLong("SInt32_0", 0) 
tg.TagGroupSetTagAsUInt16("UInt16_0", 0) 
tg.TagGroupSetTagAsUInt16("UInt16_1", 0) 

// Stream the data into the tags 
TagGroupReadTagDataFromStream(tg, "UInt32_0", stream, stream_byte_order); 
TagGroupReadTagDataFromStream(tg, "UInt32_1", stream, stream_byte_order); 
TagGroupReadTagDataFromStream(tg, "SInt32_0", stream, stream_byte_order); 
TagGroupReadTagDataFromStream(tg, "UInt16_0", stream, stream_byte_order); 
TagGroupReadTagDataFromStream(tg, "UInt16_1", stream, stream_byte_order); 

// Show the taggroup, if you want 
// tg.TagGroupOpenBrowserWindow("AuxTags",0) 

// Get the data from the tags 
tg.TagGroupGetTagAsUInt32("UInt32_0", v_uint32_0) 
tg.TagGroupGetTagAsUInt32("UInt32_1", v_uint32_1) 
tg.TagGroupGetTagAsLong("Sint32_0", v_sint32_0) 
tg.TagGroupGetTagAsUInt16("UInt16_0", v_uint16_0) 
tg.TagGroupGetTagAsUInt16("UInt16_1", v_uint16_1)

已經有一個帖子在這裏現場有關搜索流中的模式：Find a pattern image (binary file) 這表明你將如何使用流的圖像中的樣子，但你可以使用文件流當然直接。

作爲替代方案，可以預先準備一個合適的圖像讀取後從流中整個陣列與ImageReadImageDataFromStream。然後，您可以使用圖像搜索位置。這將是一個例子：

// Example of reading the first X bytes of a file 
// as uInt16 data 

image ReadHeaderAsUint16(string filepath, number nBytes) 
{ 
    number kEndianness = 0 // Default byte order of the current platform 
    if (!DoesFileExist(filePath)) 
     Throw("File '" + filePath + "' not found.") 
    number fileID = OpenFileForReading(filePath) 
    object fStream = NewStreamFromFileReference(fileID, 1) 
    if (nBytes > fStream.StreamGetSize()) 
     Throw("File '" + filePath + "' has less than " + nBytes + "bytes.") 

    image buff := IntegerImage("Header", 2, 0, nBytes/2) // UINT16 array of suitable size 
    ImageReadImageDataFromStream(buff, fStream, kEndianness) 
    return buff 
} 

number FindSignature(image header, image search) 
{ 
    // 1D images only 
    if (  (header.ImageGetNumDimensions() != 1) \ 
      || (search.ImageGetNumDimensions() != 1)) 
     Throw("Only 1D images supported") 

    number sx = search.ImageGetDimensionSize(0) 
    number hx = header.ImageGetDimensionSize(0) 
    if (hx < sx) 
     return -1 

    // Create a mask of possible start locations 
    number startV = search.getPixel(0, 0) 
    image mask = (header == startV) ? 1 : 0 

    // Search all the occurances from the first 
    number mx, my 
    while(max(mask, mx, my)) 
    { 
     if (0 == sum(header[0,mx,1,mx+sx] - search)) 
      return mx 
     else 
      mask.SetPixel(mx, 0, 0) 
    } 
    return -1 
} 

// Example 
// 1) Load file header as image (up to the size you want) 
string path = GetApplicationDirectory("open_save", 0) 
number maxHeaderSize = 200 
if (!OpenDialog(NULL, "Select file to open", path, path)) Exit(0) 
image headerImg := ReadHeaderAsUint16(path, maxHeaderSize ) 
headerImg.ShowImage() 

// 2) define search-header as image 
image search := [8]: { 02, 06, 08, 22, 02, 02, 08, 00 } 
// MatrixPrint(search) 

// 3) search for it in the header 
number foundAt = FindSignature(headerImg, search) 
if (-1 == foundAt) 
    Throw("The file header does not contain the search pattern.") 
else 
    OKDialog("Found the search pattern at offset: " + foundAt * 16 + "bytes")

來源

2017-06-15 15:51:01 BmyGuest

感謝您的回覆。 – Roger

感謝您的回覆。我對標籤不是很熟悉。但我可以通過標籤讀取流，如Int32，Int16和Double。據我的理解，Int32的長度爲02 06 08 22，而Int16的長度爲02 06.它的流如00 02 06 08，然後通過讀Int16錯過這個序列。所以我需要讀作「Int8」（我甚至不知道它是否被稱爲這種方式）。那麼長度應該只是02。通過這種方式，我將確保它不會錯過正確的標題序列。現在的問題是我無法讀取「Int8」。只讀文本有這樣的長度，但我可以'比較ASCII – Roger

@Roger您可以讀取數據爲UInt8圖像 - IntegerImage（「」，1,0，...） - 或Int8圖像 - IntegerImage（「」，1 ，1，...）。或者你可以搜索兩遍（每步一個偏移量爲1字節）。不幸的是，我現在不知道「GetTagAsUInt8」或「Int8」這樣的命令。 – BmyGuest

如果你是一個現代化的機器上，只是將文件加載到內存中，然後掃描使用的內存比較功能和行駛指數序列。

這並不是最有效的處理內存的方法，甚至是最快的，但它很簡單，快速，假設您有資源可以刻錄。

來源

2017-06-15 14:30:45

Unfortuanately沒有在DM-腳本語言沒有這樣的功能可按availaible。然而，我在下面發表的想法在原理上沒有太大的不同。 – BmyGuest

如何搜索二進制數據中的唯一序列？

回答

相關問題