2017-04-03 44 views
3

我遇到一些代碼問題。我正在嘗試在文件中找到重複的單詞,例如「the」,然後打印它發生的行。到目前爲止,我的代碼適用於行數,但給我所有在整個文件中重複的單詞,而不僅僅是一個接一個地重複單詞。我需要改變什麼,所以它只計算加倍的單詞?如何在文件中找到加倍的單詞

my_file = input("Enter file name: ") 
lst = [] 
count = 1 
with open(my_file, "r") as dup: 
for line in dup: 
    linedata = line.split() 
    for word in linedata: 
     if word not in lst: 
      lst.append(word) 
     else: 
      print("Found word: {""} on line {}".format(word, count)) 
      count = count + 1 
dup.close() 
+0

剛復位'LST = []'在每行迭代。 –

+0

@ Jean-FrançoisFabre,它可以檢測任何重複的單詞,而不僅僅是相鄰的單詞。 – Maciek

回答

0

這裏只純問題的答案問:

「什麼我需要改變,因此只計算增加了一倍的話嗎?」

在這裏,你是:

my_file = input("Enter file name: ") 
count = 0 
with open(my_file, "r") as dup: 
for line in dup: 
    count = count + 1 
    linedata = line.split() 
    lastWord = '' 
    for word in linedata: 
     if word == lastWord: 
      print("Found word: {""} on line {}".format(word, count)) 
     lastWord = word 
dup.close() 
1
my_file = input("Enter file name: ") 
with open(my_file, "r") as dup: 
    for line_num, line in enumerate(dup): 
     words_in_line = line.split() 
     duplicates = [word for i, word in enumerate(words_in_line[1:]) if words_in_line[i] == word] 
     # now you have a list of duplicated words in line in duplicates 
     # do whatever you want with it 
+1

它應該是'words_in_line [i]'因爲枚舉已經從0開始;) – swenzel

+0

@swenzel你是對的,謝謝!現在修復它。 – Maciek

0

把下面的代碼在名爲THISfile.py文件並執行它,看看有什麼是不:

# myFile = input("Enter file name: ") 
# line No 2: line with with double 'with' 
# line No 3: double (word , word) is not a double word 
myFile="THISfile.py" 
lstUniqueWords = [] 
noOfFoundWordDoubles = 0 
totalNoOfWords  = 0 
lineNo    = 0 
lstLineNumbersWithWordDoubles = [] 
with open(myFile, "r") as myFile: 
    for line in myFile: 
     lineNo+=1 # memorize current line number 
     lineWords = line.split() 
     if len(lineWords) > 0: # scan line only if it contains words 
      currWord = lineWords[0] # remember already 'visited' word 
      totalNoOfWords += 1 
      if currWord not in lstUniqueWords: 
       lstUniqueWords.append(currWord) 
       # put 'visited' word word into lstAllWordsINmyFile (if it is not already there) 
      lastWord = currWord # we are done with current, so current becomes last one 
      if len(lineWords) > 1 : # proceed only if line has two or more words 
       for word in lineWords[1:] : # loop over all other words 
        totalNoOfWords += 1 
        currWord = word 
        if currWord not in lstUniqueWords: 
         lstUniqueWords.append(currWord) 
         # put 'visited' word into lstAllWordsINmyFile (if it is not already there) 
        if(currWord == lastWord): # duplicate word found: 
         noOfFoundWordDoubles += 1 
         print("Found double word: ['{""}'] in line {}".format(currWord, lineNo)) 
         lstLineNumbersWithWordDoubles.append(lineNo) 
        lastWord = currWord 
        #  ^--- now after all all work is done, the currWord is considered lastWord 
print(
    "noOfDoubles", noOfFoundWordDoubles, "\n", 
    "totalNoOfWords", totalNoOfWords, "uniqueWords", len(lstUniqueWords), "\n", 
    "linesWithDoubles", lstLineNumbersWithWordDoubles 
) 

輸出應該是:

Found double word: ['with'] in line 2 
Found double word: ['word'] in line 19 
Found double word: ['all'] in line 33 
noOfDoubles 3 
totalNoOfWords 221 uniqueWords 111 
linesWithDoubles [2, 19, 33] 

現在您可以查看代碼中的註釋以更好地瞭解它的工作原理。玩得開心:)編碼

相關問題