2014-07-07 17 views
0

我試圖處理文本,也就是聖經,提取其詞的字母的數值,根據詞典:GEMATRIA功能 - 處理文本根據數值

def gematria(book): 

    dict = { 
       'a':1, 'b':2, 'c':3, 'd':4, 'e':5, 
       'f':80, 'g':3, 'h':8,'i':10, 'j':10, 
       'k':20, 'l':30, 'm':40, 'n':50, 'o':70, 
       'p':80, 'q':100,'r':200, 's':300, 
       't':400, 'u':6, 'v':6, 'w':800, 'x':60, 
       'y':10, 'z':7 
      } 

使用在NLTK模塊,我歸結爲:

raw = nltk.corpus.gutenberg.raw(book) 
tokens = nltk.word_tokenize(raw) 
words_and_numbers = [w.lower() for w in tokens] 
words = [w for w in words_and_numbers if re.search('[^0-9:0-9]', w)] 
vocab = sorted(set(words)) 
nested = [list(w) for w in vocab] 

我結束了與每個單詞的字母串名單, 即[['h', 'o', 'l', 'y'],['b', 'i', 'b', 'l', 'e']...]

在奧德R鍵處理個別單詞,並有自己的數值,下面的列表內涵,其次是功能sum()工作:

word_value_1 = [dict[letter] for letter in nested[0]] 
sum(word_value_1) 

word_value_2 = [dict[letter] for letter in nested[1]] 
sum(word_value_2) 

(...) 

問題:我如何寫一個列表理解,或者一個循環,返回我的一本書中所有單詞的數值,在一個大列表中?

回答

0

假設,nested = [['h', 'o', 'l', 'y'],['b', 'i', 'b', 'l', 'e']]

print [sum([dict[letter] for letter in word]) for word in nested] 

輸出

[118, 49] 
+0

@data_garden這是你期待什麼? –

+0

是的,謝謝@Ashoka Lella –

+0

@data_garden歡迎 –