使用python分析一串文本中的bigrams

我正在嘗試使用python來幫助我破解Vigenère密碼。我對編程相當陌生，但我設法制作了一個算法來分析一串文本中的二元語音頻率。這是我到目前爲止有：使用python分析一串文本中的bigrams

import nltk, string 
from nltk import bigrams 

Ciphertext = str(input("What is the text to be analysed?")) 

#Removes spacing and punctuation to make the text easier to analyse 
def Remove_Formatting(str): 
    str = str.upper() 
    str = str.strip() 
    str = str.replace(' ','') 
    str = str.translate(str.maketrans({a:None for a in string.punctuation})) 
    return str 

Ciphertext = Remove_Formatting(Ciphertext) 

#Score is meant to increase if most common bigrams are in the text 
def Bigram(str): 
    Common_Bigrams = ['TH',  'EN',  'NG', 
         'HE',  'AT',  'AL', 
         'IN',  'ED',  'IT', 
         'ER',  'ND',  'AS', 
         'AN',  'TO',  'IS', 
         'RE',  'OR',  'HA', 
         'ES',  'EA',  'ET', 
         'ON',  'TI',  'SE', 
         'ST',  'AR',  'OU', 
         'NT',  'TE',  'OF'] 
    Bigram_score = int(0) 
    for bigram in str: 
     if bigram in Common_Bigrams: 
      Bigram_score += 1 
      return Bigram_score 

Bigram(Ciphertext) 

print (Bigram_score)

然而，當我試圖用一個文本運行此我得到這個錯誤：

Traceback (most recent call last): 
    File "C:/Users/Tony/Desktop/Bigrams.py", line 36, in <module> 
    print (Bigram_score) 
NameError: name 'Bigram_score' is not defined

這是什麼意思？我以爲我已經將Bigram_score定義爲一個變量，並且我已經嘗試了所有內容，但它仍以這種方式或那種方式返回錯誤。我做錯了什麼？請幫助...

由於提前，

託尼

來源

2016-11-28 Tony Zhang

你可以做Bigram_score全球性的，就像這樣：

def Bigram(string): # don't override str 
    global Bigram_score 
    Common_Bigrams = ['TH',  'EN',  'NG', 
         'HE',  'AT',  'AL', 
         'IN',  'ED',  'IT', 
         'ER',  'ND',  'AS', 
         'AN',  'TO',  'IS', 
         'RE',  'OR',  'HA', 
         'ES',  'EA',  'ET', 
         'ON',  'TI',  'SE', 
         'ST',  'AR',  'OU', 
         'NT',  'TE',  'OF'] 
    Bigram_score = 0 # that 0 is an integer is implicitly understood 
    for bigram in string: 
     if bigram in Common_Bigrams: 
      Bigram_score += 1 
      return Bigram_score

您也返回的結果從Bigram功能綁定一個變量，如下所示：

Bigram_score = Bigram(Ciphertext) 

print(Bigram_score)

或者：

print(Bigram(Ciphertext))

當您爲函數中的變量賦值時，它們是本地的並綁定到該函數。如果一個函數返回任何東西，返回值必須綁定到一個變量才能正確重用（或直接使用）。

這是它如何工作的例子：

spam = "spam" # global spam variable 

def change_spam(): 
    spam = "ham" # setting the local spam variable 
    return spam 

change_spam() 
print(spam) # prints spam 

spam = change_spam() # here we assign the returned value to global spam 
print(spam) # prints ham

另外，你的for循環遍歷unigram進行，而不是二元語法。讓我們仔細看看：

for x in "hellothere": 
    print(x)

這將打印unigrams。因此，我們重命名代碼中的bigram變量，以查看哪裏存在一些邏輯問題。

for unigram in string: 
    if unigram in Common_Bigrams: 
     print("bigram hit!")

由於沒有對unigram是相同與任何雙字母組，"bigram hit!"將永遠不會被打印。我們可以嘗試使用不同的方法獲得bigrams，使用while循環和索引號。

index = 0 
n = 2 # for bigrams 
while index < len(string)-(n-1): # minus the length of n-1 (n-grams) 
    ngram = string[index:index+n] # collect ngram 
    index += 1 # important to add this, otherwise the loop is eternal! 
    print(ngram)

接下來，只是在循環中包含你想要用bigram做什麼。

來源

2016-11-28 17:19:07 internetional

我已經嘗試過'全局'功能，而且工作，謝謝！ –

但是，Bigram_score原來是'0'，這不是我所期望的。算法有什麼問題嗎？ –

是的，我可以看到你的算法有問題。它在for循環中，你似乎在循環每個單獨的字符。事實上，你看unigrams，並問他們是否等於任何bigrams。他們永遠不會這樣做，並給你0分。或者至少我會猜測那就是這樣。 – internetional

使用python分析一串文本中的bigrams

回答

相關問題