如何檢查是否字節數組包含另一個數組

我有一個很長的字節數組，例如：如何檢查是否字節數組包含另一個數組

Byte[] bytes = {90, 80, 63, 65, 70 ...};

這幾乎是20-30 MB（理論上）。有沒有一種快速的方法，來檢查這個數組包含另一個數組例如：

Byte[] small = {63, 80, 75, 77};

首先，我需要找到他們在小陣定義的順序字節。其次，我需要在另一個數組中找到數組，而不是任何小數組的字節。謝謝大家前進。

來源

2016-01-26 kate

你能更具體嗎？你是指任何地方的任何順序的確切序列或數字？ –

[檢查一個數組是否是另一個數組的子集]可能的重複（http://stackoverflow.com/questions/332973/check-whether-an-array-is-a-subset-of-another） –

您是否在尋找對於*序列*，即那四個字節必須按照這個順序出現？如果是這樣的話，這通常被稱爲「substring search」（即使它不涉及任何語言中的字符串），並且您應該能夠查找相當多的算法。 –

-1

如果你以字節爲單位得到了數以百萬計的元素，我建議：

排序字節的小
的foreach如果找不到元素返回false

因此

bytes.Sort(); // only need to do this once. 
bool smallContained = ContainsAll(bytes, small);

and

static bool ContainsAll(int[] src, int [] subset) 
{ 
    foreach(var i in subset) 
     if (src.BinarySearch(i) < 0) 
       return false; 
    return true; 
}

來源

2016-01-26 07:32:38

我需要按照小數組中定義的順序查找字節。 – kate

static int search(byte[] haystack, byte[] needle) 
{ 
    for (int i = 0; i <= haystack.Length - needle.Length; i++) 
    { 
     if (match(haystack, needle, i)) 
     { 
      return i; 
     } 
    } 
    return -1; 
} 

static bool match(byte[] haystack, byte[] needle, int start) 
{ 
    if (needle.Length + start > haystack.Length) 
    { 
     return false; 
    } 
    else 
    { 
     for (int i = 0; i < needle.Length; i++) 
     { 
      if (needle[i] != haystack[i + start]) 
      { 
       return false; 
      } 
     } 
     return true; 
    } 
}

來源

2016-01-26 07:42:17

這不會很快，但它會工作。 :-) –

代碼有效，但不是我所需要的。 – kate

使用此：

bool isSubset = !t2.Except(t1).Any();

這是一個從@Farhad Jabiyev的鏈接： Check whether an array is a subset of another

來源

2016-01-26 07:46:02

OP不將數組視爲*集合*，而是視爲*序列*。 –

它不能解決問題。它沒有按照大數組的順序定位哪些字節是以小數組定義的。 – kate

對於性能你想要的東西，如Boyer-Moore string search algorithm。雖然它是爲字符串設計的，但它在字節數組上應該也能工作，並且比蠻力解決方案更具性能。

維基百科文章提供了幾個實現，其中包括Java中的一個和C中的一個，因此創建C＃實現應該相當輕鬆。

事實證明，翻譯維基百科的文章的Java實現博耶 - 穆爾的爲C＃（和char到byte）是無痛確實如此。這裏是：

public static class BoyerMoore 
{ 
    public static int IndexOf(byte[] haystack, byte[] needle) 
    { 
     if (needle.Length == 0) 
     { 
      return 0; 
     } 

     int[] charTable = MakeCharTable(needle); 
     int[] offsetTable = MakeOffsetTable(needle); 
     for (int i = needle.Length - 1; i < haystack.Length;) 
     { 
      int j; 
      for (j = needle.Length - 1; needle[j] == haystack[i]; --i, --j) 
      { 
       if (j == 0) 
       { 
        return i; 
       } 
      } 

      i += Math.Max(offsetTable[needle.Length - 1 - j], charTable[haystack[i]]); 
     } 

     return -1; 
    } 

    private static int[] MakeCharTable(byte[] needle) 
    { 
     const int ALPHABET_SIZE = 256; 
     int[] table = new int[ALPHABET_SIZE]; 
     for (int i = 0; i < table.Length; ++i) 
     { 
      table[i] = needle.Length; 
     } 

     for (int i = 0; i < needle.Length - 1; ++i) 
     { 
      table[needle[i]] = needle.Length - 1 - i; 
     } 

     return table; 
    } 

    private static int[] MakeOffsetTable(byte[] needle) 
    { 
     int[] table = new int[needle.Length]; 
     int lastPrefixPosition = needle.Length; 
     for (int i = needle.Length - 1; i >= 0; --i) 
     { 
      if (IsPrefix(needle, i + 1)) 
      { 
       lastPrefixPosition = i + 1; 
      } 

      table[needle.Length - 1 - i] = lastPrefixPosition - i + needle.Length - 1; 
     } 

     for (int i = 0; i < needle.Length - 1; ++i) 
     { 
      int slen = SuffixLength(needle, i); 
      table[slen] = needle.Length - 1 - i + slen; 
     } 

     return table; 
    } 

    private static bool IsPrefix(byte[] needle, int p) 
    { 
     for (int i = p, j = 0; i < needle.Length; ++i, ++j) 
     { 
      if (needle[i] != needle[j]) 
      { 
       return false; 
      } 
     } 

     return true; 
    } 

    private static int SuffixLength(byte[] needle, int p) 
    { 
     int len = 0; 
     for (int i = p, j = needle.Length - 1; i >= 0 && needle[i] == needle[j]; --i, --j) 
     { 
      len += 1; 
     } 

     return len; 
    } 
}

該算法花費了一個線性位的時間在開始，建立它的表;從那以後它的速度非常快。

來源

2016-01-26 08:35:47

如果我理解正確，你想說如果small是bytes的子序列。你可以在字節上找到循環。由於處理器緩存，它應該運行得非常快。

for (int i = 0, index = 0; i < bytes.Length; ++i) 
    if (bytes[i] == small[index]) { 
     if (++index >= small.Length) { 
     return true; 
     } 
    } 
    return false;

來源

2016-01-26 13:26:30 Sorin

這可能會很快。但它不起作用。 –

@PetterHesselberg你的意思是它沒有正確地檢測子序列，或者它沒有解決頂部的混淆問題？ – Sorin

它甚至不按原樣編譯;該代碼包含一個左大括號和兩個右大括號。但這是一個狡猾，容易修復的問題。問題是它會發現誤報 - 如果'bytes'中的字節存在於'bytes'中，但其中的其他值存在，則您的算法仍會返回'true'。我確實相信OP需要連續的字節序列。 –

如何檢查是否字節數組包含另一個數組

回答

相關問題