2013-12-03 74 views
4

我正在使用python構建一個凱撒密碼解密器,它的工作原理和解密已加密的單詞。然而,它顯示了所有的蠻力解密嘗試,例如,用你的密鑰3加密的「HELLO」是KHOOR。解密後的結果是「KHOORJGNNQIFMMPHELLOGDKKNFCJJMEBIILDAHHKCZGGJBYFFIAXEEHZWDDGYVCCFXUBBEWTAADVSZZCURYYBTQXXASPWWZROVVYQNUUXPMTTWOLSSVNKRRUMJQQTLIPPS」我想知道是否有使用字典與Python在此輸出來搜索英文單詞的方式或能提高我的代碼,只打印出已知的英語單詞。如果以前有人問過這個問題,我抱歉,我四處搜尋,似乎找不到正確的東西。在無間隔段落中查找單詞?

+2

看起來好像字母頻率會很有幫助,還有字母對頻率 –

+1

@JoranBeasley說什麼是替代密碼的密碼分析的標準方法。安德魯,你應該堅持。 – Hyperboreus

+0

請參閱編輯我的答案。我已經使用頻率實現了一種天真的方法。 – Hyperboreus

回答

4
englishWords = ['HELLO', 'ME', 'AXE', 'FOO', 'BAR', 'BAZ'] #and many more 
cypher = 'KHOORJGNNQIFMMPHELLOGDKKNFCJJMEBIILDAHHKCZGGJBYFFIAXEEHZWDDGYVCCFXUBBEWTAADVSZZCURYYBTQXXASPWWZROVVYQNUUXPMTTWOLSSVNKRRUMJQQTLIPPS' 

for word in englishWords: 
    if word not in cypher: continue 
    print('Found "{}"'.format(word)) 

這產生了:

Found "HELLO" 
Found "ME" 
Found "AXE" 

如果這是關於看,如果decyphering文本的關鍵是正確的,也就是說,如果結果可能爲英文單詞,我不會找單詞,但試圖在結果中找到不符合英語音節的羣集。


這裏的字母頻率很天真的實現掃描:

#! /usr/bin/python3 

plain = 'Z RD NFEUVIZEX ZW KYVIV ZJ R NRP KF LJV R UZTKZFERIP NZKY GPKYFE KF JVRITY WFI RE VEXCZJY NFIU ZE KYZJ FLKGLK FI TRE Z ZDGIFMV DP TFUV KF FECP GIZEK FLK BEFNE VEXCZJY NFIUJ. RGFCFXZVJ ZW KYZJ YRJ SVVE RJBVU SVWFIV, Z JVRITYVU RIFLEU REU TFLCUE\'K JVVD KF WZEU KYV IZXYK KYZEX.'.upper() 

freqs = {'E': 12.7, 'T': 9.1, 'A': 8.2, 'O': 7.5, 'I': 7.0} 

def cypher(text, key): 
    return ''.join(chr((ord(c) - ord('A') + key) % 26 + ord('A')) if 'A' <= c <= 'Z' else c for c in text) 


def crack(text): 
    length = len(text) 
    best = 100000 
    bestMatch = '' 
    for key in range(26): 
     cand = cypher(text, key) 
     quality = 0 
     for l, c in {letter: sum(1 for c in cand if c == letter) for letter in 'ETAOI'}.items(): 
      quality += (c/length - freqs[l]) ** 2 
     if quality < best: 
      best = quality 
      bestMatch = cand 
    return bestMatch 

print(crack(plain)) 

以下三個例子:

Input: TQ ESTD TD LMZFE DPPTYR, TQ ESP VPJ QZC OPNJASPCTYR L EPIE TD ESP NZCCPNE ZYP, T.P. TQ ESP CPDFWE XTRSE MP PYRWTDS HZCOD, T HZFWOY'E WZZV QZC HZCOD, MFE ECJ EZ QTYO NWFDEPCD TYDTOP ESP CPDFWE HSTNS OZ YZE NZXAWJ HTES ESP PYRWTDS DJWWLMWP LALCEFD. 

Output: IF THIS IS ABOUT SEEING, IF THE KEY FOR DECYPHERING A TEXT IS THE CORRECT ONE, I.E. IF THE RESULT MIGHT BE ENGLISH WORDS, I WOULDN'T LOOK FOR WORDS, BUT TRY TO FIND CLUSTERS INSIDE THE RESULT WHICH DO NOT COMPLY WITH THE ENGLISH SYLLABLE APARTUS. 

Input: KWSJUZAFY XGJ WFYDAKZ OGJVK AF S TDGUC GX MFVAXXWJWFLASLWV LWPL DACW LZSL AK UWJLSAFDQ HGKKATDW, SFV VGAFY AL WXXAUAWFLDQ AK S YWFMAFWDQ AFLWJWKLAFY HJGTDWE. TML AL'K HJGTDWESLAU XGJ DGLK GX JWSKGFK, BMKL GFW GX OZAUZ AK LZSL QGMJ WFUJQHLWV LWPL ESQ AFUDMVW LWPL LZSL JSFVGEDQ ZSHHWFK LG XGJE SF WFYDAKZ OGJV UGEHDWLWDQ TQ UZSFUW. 

Output: SEARCHING FOR ENGLISH WORDS IN A BLOCK OF UNDIFFERENTIATED TEXT LIKE THAT IS CERTAINLY POSSIBLE, AND DOING IT EFFICIENTLY IS A GENUINELY INTERESTING PROBLEM. BUT IT'S PROBLEMATIC FOR LOTS OF REASONS, JUST ONE OF WHICH IS THAT YOUR ENCRYPTED TEXT MAY INCLUDE TEXT THAT RANDOMLY HAPPENS TO FORM AN ENGLISH WORD COMPLETELY BY CHANCE. 

Input: QZC PILXAWP, UFDE ESP EPIE JZF'GP AZDEPO SPCP TYNWFOPD SPWW, TQ, WTA, WZR, LDA LYO ACZMLMWJ ZESPCD. JZF NZFWO ECTX OZHY ESP LWEPCYLETGPD MJ ZYWJ DPLCNSTYR QZC HZCOD ESP DLXP WPYRES LD JZFC ELCRPE HZCO, LYO ZYWJ QZC HZCOD HTES ESP DLXP WPEEPC ALEEPCY. MFE ESLE'D CPLWWJ BFTEP L WZE ZQ HZCV EZ RPE LCZFYO ESP QLNE ESLE JZFC TYTETLW ZFEAFE SLD L WZE ZQ FDPWPDD OLEL TY TE. 

Output: FOR EXAMPLE, JUST THE TEXT YOU'VE POSTED HERE INCLUDES HELL, IF, LIP, LOG, ASP AND PROBABLY OTHERS. YOU COULD TRIM DOWN THE ALTERNATIVES BY ONLY SEARCHING FOR WORDS THE SAME LENGTH AS YOUR TARGET WORD, AND ONLY FOR WORDS WITH THE SAME LETTER PATTERN. BUT THAT'S REALLY QUITE A LOT OF WORK TO GET AROUND THE FACT THAT YOUR INITIAL OUTPUT HAS A LOT OF USELESS DATA IN IT. 

在這裏,沒有空格和標點的最後一個例子:

Input: ZRDLJZEXGPKYFEKFSLZCURTRVJRITZGYVIUVTIPGKVIZKNFIBJREUUVTIPGKJKYVRCIVRUPVETIPGKVUNFIUYFNVMVIZKJYFNJRCCZKJ SILKVWFITVUVTIPGKZFERKKVDGKJWFIVORDGCVYVCCFVETIPGKVUNZKYRBVPFW3ZJBYFFI 

Output: IAMUSINGPYTHONTOBUILDACAESARCIPHERDECRYPTERITWORKSANDDECRYPTSTHEALREADYENCRYPTEDWORDHOWEVERITSHOWSALLITS BRUTEFORCEDECRYPTIONATTEMPTSFOREXAMPLEHELLOENCRYPTEDWITHAKEYOF3ISKHOOR 
+0

我加了一些例子。 – Hyperboreus

3

像這樣在一塊無差別的文本中搜索英文單詞當然是可能的,並且高效地執行它是一個真正有趣的問題。但由於很多原因,這是有問題的,其中一個原因是您的加密文本可能包含隨機發生的偶然形成英文單詞的文本。

例如,只有你在這裏張貼的文本包括HELLIFLIPLOGASP和可能其他人。您只需搜索與您的目標詞相同長度的詞,並且僅限於具有相同字母模式的詞,纔可以減少替代方法。但是爲了解決這個事實,你的初始輸出中有很多無用的數據,這確實是相當多的工作。在從行字典文件

  1. 閱讀(/usr/share/dict/words在大多數系統上):

    你可以很容易地檢查,找出一個特定的詞是否在通過這樣一本英語詞典。

  2. 刪除空格,轉換爲小寫並將每行存儲在Python字典中。
  3. 解密每個單詞後,檢查它是否作爲Python詞典中的鍵存在。

採取這種方法可能比試圖通過未分配的初始輸出進行拼湊更有意義。

+0

解決該問題(處理數據段而不是單個單詞時)的方法是根據字符總數要求最少數量的匹配。 – Razick