Python - 將127,000+個單詞導入列表，但函數僅返回部分結果

此函數旨在將從字典文件導入的所有127,000個單詞與輸入長度的用戶進行比較。然後它應該返回等於該長度的單詞量。它確實在一定程度上做到了這一點。Python - 將127,000+個單詞導入列表，但函數僅返回部分結果

如果輸入「15」，則返回「0」。如果輸入「4」，則返回「3078」。

我確信長度爲15個字符的單詞不管怎樣都返回「0」。我還要提到的是，如果我輸入任何大於15的結果仍然是0時，有更大的話15

try: 
    dictionary = open("dictionary.txt") 
except: 
    print("Dictionary not found") 
    exit() 


def reduceDict(): 
    first_list = [] 

    for line in dictionary: 
     line = line.rstrip() 
     if len(line) == word_length: 
      for letter in line: 
       if len([ln for ln in line if line.count(ln) > 1]) == 0: 
        if first_list.count(line) < 1: 
         first_list.append(line) 
       else: 
        continue 
    if showTotal == 'y': 
     print('|| The possible words remaing are: ||\n ',len(first_list))

來源

2017-05-22 MLJezus

給出一行dictionary.txt作爲示例輸入，以便我們可以瞭解輸入的結構 – bigbounty

在字典文件中，每行都有一個單詞。即 chemotherapeutic – MLJezus

你還可以詳細說明爲什麼你需要'在線信：' – kuro

我讀的是：

if len([ln for ln in line if line.count(ln) > 1]) == 0:

是有問題的話不能有任何重複的字母，這可以解釋爲什麼沒有找到任何單詞 - 一旦你達到15，重複的信件是相當普遍的。由於這一要求並沒有在說明中提到，如果我們再下降，我們可以這樣寫：

def reduceDict(word_length, showTotal): 
    first_list = [] 

    for line in dictionary: 
     line = line.rstrip() 

     if len(line) == word_length: 
      if line not in first_list: 
       first_list.append(line) 

    if showTotal: 
     print('The number of words of length {} is {}'.format(word_length, len(first_list))) 
     print(first_list) 

try: 
    dictionary = open("dictionary.txt") 
except FileNotFoundError: 
    exit("Dictionary not found") 

reduceDict(15, True)

從我的Unix words文件變成了約40萬字。如果我們想要恢復獨特的字母要求：

import re 

def reduceDict(word_length, showTotal): 
    first_list = [] 

    for line in dictionary: 
     line = line.rstrip() 

     if len(line) == word_length and not re.search(r"(.).*\1", line): 
      if line not in first_list: 
       first_list.append(line) 

    if showTotal: 
     print('The number of words of length {} is {}'.format(word_length, len(first_list))) 
     print(first_list)

開始返回0結果，大約有13個字母，正如人們所預料的那樣。

來源

2017-05-22 05:32:43 cdlane

在代碼中，你不需要這條線 -

for letter in line:

在你的列表中理解，如果你的目的是要遍歷所有詞語的line使用本 -

if len([ln for ln in line.split() if line.count(ln) > 1]) == 0:

在您將列表理解循環中的循環編碼到每個字符上並檢查該字符是否在line中多次出現。這樣，如果您的文件包含chemotherapeutic它不會被添加到列表first_list，因爲有多次出現的字母。因此，除非您的文件包含超過14個字母，且所有字母只出現一次，否則您的代碼將無法找到它們。

來源

2017-05-22 05:32:53 kuro

Python - 將127,000+個單詞導入列表，但函數僅返回部分結果

回答

相關問題