2013-08-26 245 views
0

我想比較兩個輸入的csv文件以查看是否有添加或刪除的行。什麼是最好的方式去做這件事。我沒有使用列名,因爲列的名稱對於所有文件都不一致。比較兩個excel文件的差異

private void compare_btn_Click(object sender, EventArgs e) 
     { 
      string firstFile = firstExcel_txt.Text; 
      var results = ReadExcel(openFileDialog1); 
      string secondFile = secondExcel_txt.Text; 
      var results2 = ReadExcel(openFileDialog2); 

     } 

閱讀:

public object ReadExcel(OpenFileDialog openFileDialog) 
     { 
      var _excelFile = new ExcelQueryFactory(openFileDialog.FileName); 
      var _info = from c in _excelFile.WorksheetNoHeader() select c; 
      string header1, header2, header3; 
      foreach (var item in _info) 
      { 
       header1 = item.ElementAt(0); 
       header2 = item.ElementAt(1); 
       header3 = item.ElementAt(2); 
      } 
      return _info; 
     } 

我如何能做到這一點任何幫助將是巨大的。

+1

最好也是最準確的方法是將它們都轉換爲字節數組,並在轉換它們時進行比較。以下鏈接將幫助您將Excel錶轉換爲字節數組:http://www.c-sharpcorner.com/UploadFile/1a81c5/convert-file-to-byte-array-and-byte-array-to-files/ – Max

+0

Masriyah你只有3列,或者你只是簡化了你的代碼?我沒有看到你在哪裏保持Excel文件的內容來執行比較 –

+0

或者你可以放棄列和哈希其餘的。如果這兩個文件的哈希匹配,那麼他們具有相同的數據,逐字。取決於所使用的算法,哈希碰撞的可能性很小,但它很小,在碰撞之前地獄會凍結。 – Renan

回答

1

我建議你計算哈希爲Excel文件中的每一行,那麼你就可以繼續並比較每一行的散列以查看它是否與其他文件上的任何散列匹配(請參閱源代碼中的註釋)

我還提供了一些類來存儲Excel文件的內容

using System.Security.Cryptography; 

private void compare_btn_Click(object sender, EventArgs e) 
{ 
    string firstFile = firstExcel_txt.Text; 
    ExcelInfo file1 = ReadExcel(openFileDialog1); 

    string secondFile = secondExcel_txt.Text; 
    ExcelInfo file2 = ReadExcel(openFileDialog2); 

    CompareExcels(file1,file2) ; 
}  

public void CompareExcels(ExcelInfo fileA, ExcelInfo fileB) 
{ 
    foreach(ExcelRow rowA in fileA.excelRows) 
    { 
     //If the current hash of a row of fileA does not exists in fileB then it was removed 
     if(! fileB.ContainsHash(rowA.hash)) 
     { 
      Console.WriteLine("Row removed" + rowA.ToString()); 
     } 
    } 

    foreach(ExcelRow rowB in fileB.excelRows) 
    { 
     //If the current hash of a row of fileB does not exists in fileA then it was added 
     if(! fileA.ContainsHash(rowB.hash)) 
     { 
      Console.WriteLine("Row added" + rowB.ToString()); 
     } 
    } 
} 

public Class ExcelRow 
{ 
    public List<String> lstCells ; 
    public byte[] hash 

    public ExcelRow() 
    { 
     lstCells = new List<String>() ; 
    } 
    public override string ToString() 
    { 
     string resp ; 

     resp = string.Empty ; 

     foreach(string cellText in lstCells) 
     { 
      if(resp != string.Empty) 
      { 
       resp = resp + "," + cellText ; 
      } 
      else 
      { 
       resp = cellText ; 
      } 
     } 
     return resp ; 
    }  
    public void CalculateHash() 
    { 
     byte[] rowBytes ; 
     byte[] cellBytes ; 
     int pos ; 
     int numRowBytes ; 

     //Determine how much bytes are required to store a single excel row 
     numRowBytes = 0 ; 
     foreach(string cellText in lstCells) 
     { 
      numRowBytes += NumBytes(cellText) ; 
     }  

     //Allocate space to calculate the HASH of a single row 

     rowBytes= new byte[numRowBytes] 
     pos = 0 ; 

     //Concatenate the cellText of each cell, converted to bytes,into a single byte array 
     foreach(string cellText in lstCells) 
     { 
      cellBytes = GetBytes(cellText) ; 
      System.Buffer.BlockCopy(cellBytes, 0, rowBytes, pos, cellBytes.Length); 
      pos = cellBytes.Length ; 

     } 

     hash = new MD5CryptoServiceProvider().ComputeHash(rowBytes); 

    } 
    static int NumBytes(string str) 
    { 
     return str.Length * sizeof(char); 
    } 

    static byte[] GetBytes(string str) 
    { 
     byte[] bytes = new byte[NumBytes(str)]; 
     System.Buffer.BlockCopy(str.ToCharArray(), 0, bytes, 0, bytes.Length); 
     return bytes; 
    } 
} 
public Class ExcelInfo 
{ 
    public List<ExcelRow> excelRows ; 

    public ExcelInfo() 
    { 
     excelRows = new List<ExcelRow>(); 
    } 
    public bool ContainsHash(byte[] hashToLook) 
    { 
     bool found ; 

     found = false ; 

     foreach(ExcelRow eRow in excelRows) 
     { 
      found = EqualHash(eRow.hash, hashToLook) ; 

      if(found) 
      { 
       break ; 
      } 
     } 

     return found ; 
    } 
    public static EqualHash(byte[] hashA, byte[] hashB) 
    { 
     bool bEqual ; 
     int i ; 

     bEqual = false; 
     if (hashA.Length == hashB.Length) 
     { 
      i = 0; 
      while ((i < hashA.Length) && (hashA[i] == hashB[i])) 
      { 
       i++ ; 
      } 
      if (i == hashA.Length) 
      { 
       bEqual = true; 
      } 
     } 
     return bEqual ; 
    } 
} 

public ExcelInfo ReadExcel(OpenFileDialog openFileDialog) 
{ 
    var _excelFile = new ExcelQueryFactory(openFileDialog.FileName); 
    var _info = from c in _excelFile.WorksheetNoHeader() select c; 

    ExcelRow excelRow ; 
    ExcelInfo resp ; 

    resp = new ExcelInfo() ; 

    foreach (var item in _info) 
    { 
     excelRow = new ExcelRow() ; 

     //Add all the cells (with a for each) 
     excelRow.lstCells.Add(item.ElementAt(0)); 
     excelRow.lstCells.Add(item.ElementAt(1)); 
     .... 
     //Add the last cell of the row 
     excelRow.lstCells.Add(item.ElementAt(N)); 

     //Calculate the hash of the row 
     excelRow.CalculateHash() ; 

     //Add the row to the ExcelInfo object 
     resp.excelRows.Add(excelRow) ; 
    } 
    return resp ; 
} 
+0

我會繼續嘗試,並會讓你知道謝謝。 – Masriyah

+0

爲我在ReadExcel方法返回'return _info'它拋出一個錯誤,我缺少一個強制轉換,並不能從linq IQuerable轉換爲ExcelInfo(ExcelFileReader)。 – Masriyah

+1

@Masriyah對不起,你需要「返回」 –

0

最準確的方法是將它們二者轉換爲byte arrays,檢查差異當兩者都轉換爲一個數組,使用以下,關於如何轉換Excel片byte arrays鏈路的簡單例子

Convert Excel to Byte[]

現在你已經轉換既您的Excel工作表中,以一個byte [],你應該檢查他們通過檢查不同如果字節數組相等,yes或no。

的檢查可以通過幾種方式來實現以下使用linq像:

using System.Linq; //SequenceEqual 

byte[] FirstExcelFileBytes = null; 
byte[] SecondExcelFileBytes = null; 

FirstExcelFileBytes = GetFirstExcelFile(); 
SecondExcelFileBytes = GetSecondExcelFile(); 

if (FirstExcelFileBytes.SequenceEqual<byte>(SecondExcelFileBytes) == true) 
{ 
     MessageBox.Show("Arrays are equal"); 
} 
else 
{ 
    MessageBox.Show("Arrays don't match"); 
} 

有足夠多的其他方式找到比較字節數組,你應該做一些研究哪些將最適合你。

使用下面的鏈接,以檢查之類的東西Row addedrow removed

Compare excelsheets

+0

我相信這會有很大的幫助。我正在尋找更多的東西,像一排被添加或刪除 - 是否有可能? – Masriyah

+0

幫助鏈接是爲了實現這一目標,比較字節數組將返回true或false – Max