2013-07-09 248 views
2

我有兩個文件夾,dir1和dir2。我必須找到兩個文件夾(或在子文件夾中)具有相同名稱但內容不同的文件。Python - 具有相同名稱但內容不同的文件

是這樣的:so.1.0/P/Q/SEARCH.C so.1.1/P/Q/SEARCH.C不同

任何想法?

我得到的文件,我需要這樣:

import os, sys, fnmatch, filecmp 

folder1 = sys.argv[1] 
folder2 = sys.argv[2] 

filelist1 = [] 

filelist2 = [] 

for root, dirs, files in os.walk(folder1): 
    for filename in fnmatch.filter(files, '*.c'): 
     filelist1.append(os.path.join(root, filename)) 

for root, dirs, files, in os.walk(folder1): 
    for filename in fnmatch.filter(files, '*.h'): 
     filelist1.append(os.path.join(root, filename)) 

for root, dirs, files in os.walk(folder2): 
    for filename in fnmatch.filter(files, '*.c'): 
     filelist2.append(os.path.join(root, filename)) 

for root, dirs, files, in os.walk(folder2): 
    for filename in fnmatch.filter(files, '*.h'): 
     filelist2.append(os.path.join(root, filename)) 

現在我想將文件的兩個列表比較,得到它們具有相同的文件名中的條目,並檢查它們是否爲內容的不同。你怎麼看?

+0

[你有什麼試過](http://mattgemmell.com/2008/12/08/what-have-you-tried/)? – stalk

回答

1

至於@Martijn回答遍歷目的,你可以使用os.walk()

for root, dirs, files in os.walk(path): 
    for name in files: 

併爲文件名的比較,我會建議filecmp

>>> import filecmp 
>>> filecmp.cmp('undoc.rst', 'undoc.rst') 
True 
>>> filecmp.cmp('undoc.rst', 'index.rst') 
False 

而對於比較fil e內容結帳difflib

2

使用os.walk()產生(相對於自己的根與路徑)在任一目錄中的文件列表:

import os 

def relative_files(path): 
    """Generate filenames with pathnames relative to the initial path.""" 
    for root, dirnames, files in os.walk(path): 
     relroot = os.path.relpath(root, path) 
     for filename in files: 
      yield os.path.join(relroot, filename) 

從一個創建一組路徑:

root_one = 'so.1.0' # use an absolute path here 
root_two = 'so.1.1' # use an absolute path here 
files_one = set(relative_files(root_one)) 

然後找到所有的另一個根中的路徑名通過使用集交集相同:

from itertools import izip_longest 

def different_files(root_one, root_two): 
    """Yield files that differ between the two roots 

    Generate pathnames relative to root_one and root_two that are present in both 
    but have different contents. 

    """ 
    files_one = set(relative_files(root_one)) 
    for same in files_one.intersection(relative_files(root_two)): 
     # same is a relative path, so same file in different roots 
     with open(os.path.join(root_one, same)) as f1, open(os.path.join(root_two, same)) as f2: 
      if any(line1 != line2 for line1, line2 in izip_longest(f1, f2)): 
       # lines don't match, so files don't match! 
       yield same 

itertools.izip_longest()在文件上循環有效地配對行;如果一個文件比另一個文件長,其餘行將與None配對,以確保您檢測到的文件與另一個不同。

演示:

$ mkdir -p /tmp/so.1.0/p/q 
$ mkdir -p /tmp/so.1.1/p/q 
$ echo 'file one' > /tmp/so.1.0/p/q/search.c 
$ echo 'file two' > /tmp/so.1.1/p/q/search.c 
$ echo 'file three' > /tmp/so.1.1/p/q/ignored.c 
$ echo 'matching' > /tmp/so.1.0/p/q/same.c 
$ echo 'matching' > /tmp/so.1.1/p/q/same.c 

>>> for different in different_files('/tmp/so.1.0', '/tmp/so.1.1'): 
...  print different 
... 
p/q/search.c 
+0

太快:)不錯 –

相關問題