2017-04-07 100 views
1

我在python中創建了一個簡單的word計數程序,它讀取文本文件,計算詞頻並將結果寫入另一個文件。 現在的問題是,如果我想搜索「窗口」和文本文件包含一個單詞「xwindows」,那麼它也算它。Python:文件中的詞頻

import sys 
import glob 
import errno 
files = glob.glob('w.asm') 
the_count =['windows'] 
for name in files: 
    with open(name) as f: 
     print "Occurences in file -- %s " % name 
     contents = f.read() 
     print contents 
     for number in the_count: 
      print "windows occured-", contents.count(number) 

w.asm文件包含

windows 
iwindows 
qwindows 
hwindows 
kwindows 
windows 
windows 
windowsh 
wwindows 
windows 
iwindows 
qwindows 
hwindows 
kwindows 

輸出

Occurences in file -- w.asm 

windows 
iwindows 
qwindows 
hwindows 
kwindows 
windows 
windows 
windowsh 
wwindows 
windows 
iwindows 
qwindows 
hwindows 
kwindows 
windows occured- 14 

所以我想實際輸出爲4,因爲窗戶居然發生了4次,但代碼是給14 .. ..

所以請幫忙

回答

0

14實際上是正確的,因爲windowsh等包含子字符串winows。一個簡單的解決方法是首先用文字分割文件,然後致電count()

for name in files: 
    with open(name) as f: 
     print "Occurences in file -- %s " % name 
     contents = f.read().split() # <--- split 
     print contents 
     for number in the_count: 
      print "windows occured-", contents.count(number) 
+0

非常感謝你的工作 –