2011-03-21 105 views
0

我想查找某個類型的所有文件是否都已被我的程序記錄下來。所以基本上,我有一個只有文件名的日誌文件,然後使用一個函數來遍歷文件來檢查文件是否存在。現在內容非常龐大,但我以一種粗暴的方式做到了這一點。不幸的是,它不能正常工作。查找文件中的文件名(來自目錄)

import subprocess 
import sys 
import signal 
import shutil 
import os, fnmatch 


#open file to read 
f=open("logs", "r") #files are stored in this directory 
o=open("all_output_logs","w") 
e=open("missing_logs",'w') 


def locate(pattern, root=os.curdir): 
    '''Locate all files matching supplied filename pattern in and below 
    supplied root directory.''' 
     #ignore directories- ignore works, just uncomment. 
    #ignored = ["0201", "0306"] 
    for path, dirs, files in os.walk(os.path.abspath(root)): 
     #for dir in ignored: 
      # if dir in dirs: 
       #dirs.remove(dir) 
     for filename in fnmatch.filter(files, pattern): 
      yield os.path.join(path, filename) 



    #here i log all the files in the output file to search in 
for line in f: 
    if line.startswith("D:"): 
     filename = line 
     #print line 
     o.write(filename) 

f.close() 
o.close() 
r.close() 

i=open("all_output_logs","r") 
#primitive search.. going through each file in the directory to see if its there in the log file 
for filename in locate("*.dll"): 
    for line in i: 
     if filename in i: 
      count=count+1 
      print count 
     else: 
      e.write(filename) 

我沒有看到我的虛擬變量計數正在打印,我只有一個文件名,它在列表中間。

回答

1

問題是隻能在第一遍讀取文件中的行,並且文件對象(您的案例中的i)不支持使用如您所期望的in運算符。您可以將代碼更改爲如下所示:

lines = open("all_output_logs","r").readlines() 
for filename in locate("*.dll"): 
    for line in lines: 
     if filename in line: 
      count=count+1 
      print count 
     else: 
      e.write(filename) 

但它仍然效率低下,有點尷尬。

既然你說的日誌文件是「龐大」那麼你可能不希望它全部讀入內存,所以你必須要退每個查詢:

f = open("all_output_logs","r") 
for filename in locate("*.dll"): 
    f.seek(0) 
    for line in f: 
     if filename in line: 
      count=count+1 
      print count 
     else: 
      e.write(filename) 

我離開in操作符,因爲您沒有指定日誌文件的每一行包含哪些內容。人們會預期filename == line.strip()是正確的比較。