2014-04-17 87 views
0

對於一個類的賦值,我應該獲取一個文件的內容,計算MD5哈希並將其存儲在一個單獨的文件中。然後我應該能夠通過比較MD5散列來檢查完整性。我對Python和JSON相對比較陌生,所以我認爲我會試着用這個任務來解決這些問題,而不是用我已經知道的東西去解決問題。MD5哈希在Python中返回不同的結果

無論如何,我的程序從文件讀取,創建一個散列,並將該散列存儲到一個JSON文件中就好了。我的完整性檢查出現問題。當我返回計算出的文件散列結果時,即使沒有對文件進行更改,它也與JSON文件中記錄的內容不同。下面是一個發生了什麼的例子,我粘貼了我的代碼。先謝謝您的幫助。

例如:這是我的JSON文件

內容的具體內容:b'I發一個文件來測試MD5 \ n」

摘要:1e8f4e6598be2ea2516102de54e7e48e

這是返回什麼當我嘗試檢查完全相同的文件的完整性(它不需要做任何改變): 內容:b'I發一個文件來測試MD5 \ n」

摘要:ef8b7bf2986f59f8a51aae6b496e8954

import hashlib 
import json 
import os 
import fnmatch 
from codecs import open 


#opens the file, reads/encodes it, and returns the contents (c) 
def read_the_file(f_location): 
    with open(f_location, 'r', encoding="utf-8") as f: 
     c = f.read() 

    f.close() 
    return c 


def scan_hash_json(directory_content): 
    for f in directory_content: 
     location = argument + "/" + f 
     content = read_the_file(location) 
     comp_hash = create_hash(content) 
     json_obj = {"Directory": argument, "Contents": {"filename": str(f), 
                 "original string": str(content), "md5": str(comp_hash)}} 
     location = location.replace(argument, "") 
     location = location.replace(".txt", "") 
     write_to_json(location, json_obj) 


#scans the file, creates the hash, and writes it to a json file 
def read_the_json(f): 
    f_location = "recorded" + "/" + f 
    read_json = open(f_location, "r") 
    json_obj = json.load(read_json) 
    read_json.close() 
    return json_obj 


#check integrity of the file 
def check_integrity(d_content): 
    #d_content = directory content 
    for f in d_content: 
     json_obj = read_the_json(f) 
     text = f.replace(".json", ".txt") 
     result = find(text, os.getcwd()) 
     content = read_the_file(result) 
     comp_hash = create_hash(content) 
     print("content: " + str(content)) 
     print(result) 
     print(json_obj) 
     print() 
     print("Json Obj: " + json_obj['Contents']['md5']) 
     print("Hash: " + comp_hash) 


#find the file being searched for 
def find(pattern, path): 
    result = "" 
    for root, dirs, files in os.walk(path): 
     for name in files: 
      if fnmatch.fnmatch(name, pattern): 
       result = os.path.join(root, name) 
    return result 


#create a hash for the file contents being passed in 
def create_hash(content): 
    h = hashlib.md5() 
    key_before = "reallyBad".encode('utf-8') 
    key_after = "hashKeyAlgorithm".encode('utf-8') 
    content = content.encode('utf-8') 
    h.update(key_before) 
    h.update(content) 
    h.update(key_after) 
    return h.hexdigest() 


#write the MD5 hash to the json file 
def write_to_json(arg, json_obj): 
    arg = arg.replace(".txt", ".json") 
    storage_location = "recorded/" + str(arg) 
    write_file = open(storage_location, "w") 
    json.dump(json_obj, write_file, indent=4, sort_keys=True) 
    write_file.close() 

#variable to hold status of user (whether they are done or not) 
working = 1 
#while the user is not done, continue running the program 
while working == 1: 
    print("Please input a command. For help type 'help'. To exit type 'exit'") 

    #grab input from user, divide it into words, and grab the command/option/argument 
    request = input() 
    request = request.split() 

    if len(request) == 1: 
     command = request[0] 
    elif len(request) == 2: 
     command = request[0] 
     option = request[1] 
    elif len(request) == 3: 
     command = request[0] 
     option = request[1] 
     argument = request[2] 
    else: 
     print("I'm sorry that is not a valid request.\n") 
     continue 

    #if user inputs command 'icheck'... 
    if command == 'icheck': 
     if option == '-l': 
      if argument == "": 
       print("For option -l, please input a directory name.") 
       continue 

      try: 
       dirContents = os.listdir(argument) 
       scan_hash_json(dirContents) 

      except OSError: 
       print("Directory not found. Make sure the directory name is correct or try a different directory.") 

     elif option == '-f': 
      if argument == "": 
       print("For option -f, please input a file name.") 
       continue 

      try: 
       contents = read_the_file(argument) 
       computedHash = create_hash(contents) 
       jsonObj = {"Directory": "Default", "Contents": { 
        "filename": str(argument), "original string": str(contents), "md5": str(computedHash)}} 

       write_to_json(argument, jsonObj) 
      except OSError: 
       print("File not found. Make sure the file name is correct or try a different file.") 

     elif option == '-t': 
      try: 
       dirContents = os.listdir("recorded") 
       check_integrity(dirContents) 
      except OSError: 
       print("File not found. Make sure the file name is correct or try a different file.") 

     elif option == '-u': 
      print("gonna update stuff") 
     elif option == '-r': 
      print("gonna remove stuff") 

    #if user inputs command 'help'... 
    elif command == 'help': 
     #display help screen 
     print("Integrity Checker has a few options you can use. Each option " 
       "must begin with the command 'icheck'. The options are as follows:") 
     print("\t-l <directory>: Reads the list of files in the directory and computes the md5 for each one") 
     print("\t-f <file>: Reads a specific file and computes its md5") 
     print("\t-t: Tests integrity of the files with recorded md5s") 
     print("\t-u <file>: Update a file that you have modified after its integrity has been checked") 
     print("\t-r <file>: Removes a file from the recorded md5s\n") 

    #if user inputs command 'exit' 
    elif command == 'exit': 
     #set working to zero and exit program loop 
     working = 0 

    #if anything other than 'icheck', 'help', and 'exit' are input... 
    else: 
     #display error message and start over 
     print("I'm sorry that is not a valid command.\n") 
+0

你可以包括你的main()方法,以及?你是否創建了散列json文件,然後在應用程序的一次運行中檢查散列的完整性?您是否使用create_hash函數來創建初始散列,然後再次驗證它? – dano

+0

我包括我的主要方法。 json文件已經創建好了。當用戶輸入「icheck -t」時,程序會通過存儲.json文件的目錄,查找.txt文件,計算哈希,並將其與.json文件內的內容進行比較。 – SRod

+0

謝謝,這證實了我的懷疑。請參閱下面的答案,瞭解我相信問題所在。 – dano

回答

0

你在哪裏定義h,這個方法中使用的md5對象?

#create a hash for the file contents being passed in 
def create_hash(content): 
    key_before = "reallyBad".encode('utf-8') 
    key_after = "hashKeyAlgorithm".encode('utf-8') 
    print("Content: " + str(content)) 
    h.update(key_before) 
    h.update(content) 
    h.update(key_after) 
    print("digest: " + str(h.hexdigest())) 
    return h.hexdigest() 

我懷疑是你調用create_hash兩次,但在兩次調用中使用相同的md5對象。這意味着你第二次調用它,你真的哈希「reallyBad *文件內容* hashkeyAlgorithmreallyBad *文件內容* hashKeyAlgorithm」。你應該在create_hash中創建一個新的md5對象來避免這種情況。

編輯:這裏是進行此更改後的計劃,我如何運行:

Please input a command. For help type 'help'. To exit type 'exit' 
icheck -f ok.txt Content: this is a test 

digest: 1f0d0fd698dfce7ce140df0b41ec3729 Please input a command. For 
help type 'help'. To exit type 'exit' icheck -t Content: this is a 
test 

digest: 1f0d0fd698dfce7ce140df0b41ec3729 Please input a command. For 
help type 'help'. To exit type 'exit' 

編輯#2: 你scan_hash_json等功能也都有在其末端有一個錯誤。您刪除從文件中的.txt後綴,並呼籲write_to_json:

def scan_hash_json(directory_content): 
     ... 
     location = location.replace(".txt", "") 
     write_to_json(location, json_obj) 

然而,write_to_json期待在該文件中的.txt結束:

def write_to_json(arg, json_obj): 
    arg = arg.replace(".txt", ".json") 

如果解決這個問題,我想它應該做的一切,因爲預期......

+0

我明白你在說什麼。我將我的'h = hashlib.md5()'移到了create_hash函數的內部,儘管現在哈希值不同,但它們仍然不符合它們應有的方式。感謝您的輸入,但我肯定是以前一種方式犯了一個錯誤的錯誤。 – SRod

+0

你只需要清除.json文件並重試?因爲我只是在做了這個改變之後測試了你的程序,並且對我來說工作正常。請參閱我的答案中的編輯輸出。 – dano

+0

它對-f選項正確工作,但對我的-l選項不起作用。所以我必須在我的scan_hash_json函數中做錯了什麼。我必須很快趕到學校,但我一到達那裏就會再次偷看。 – SRod

0

我看你面對2可能出現的問題:

  1. 哈希計算是從一個字符串的二進制表示計算
  2. 除非您只使用ASCII編碼,相同的國際字符例如č在UTF-8或Unicode編碼中有不同的表示形式。

考慮:

  1. 如果您需要UTF-8或Unicode,normalize首先你的內容,你將它保存或計算哈希
  2. 出於測試目的比較內容的二進制表示之前。
  3. 使用UTF-8只爲IO操作,codecs.open完成所有轉換 你

    從編解碼器中導入開放 開放( 'yourfile', 'R',編碼= 「UTF-8」)爲f :
    decoded_content = f.read()

+0

這有幫助,但它仍然是壞的。爲了編譯,我不得不在create_hash函數中添加'content = content.encode('utf-8')'。一旦我做到了,現在我的2個哈希匹配,其他3個仍然被破壞。感謝您的意見,這讓我更接近解決此問題:) – SRod