2013-06-19 139 views
-4

我有一種感覺,這個Python代碼可以大大縮短,但我幾乎總是傾向於回落到編寫C風格的佈局。在你看來,縮短它的最好方法是什麼?可讀性是獎金,而不是要求。縮短Python代碼

def compfiles(file1, file2): 
    linecnt = 0 
    for line1 in open(file1): 
     line1 = line1.strip() 
     hit = False 
     for line2 in open(file2): 
      line2 = line2.strip() 
      if line2 == line1: 
       hit = True 
       break 
     if not hit: 
      print("Miss: file %s contains '%s', but file %s does not!" % (file1, line1, file2)) 
     linecnt += 1 
    print("%i lines compared between %s and %s." % (linecnt, file1, file2)) 

fn = ["file1.txt", "file2.txt"] 
compfiles(fn[0], fn[1]) 
compfiles(fn[1], fn[0]) 
+8

如果代碼工作,正確的地方要問如何提高它在http://codereview.stackexchange.com/ –

+1

你真的會打開相同的文件在C中一遍又一遍,沒有關閉它,或只是尋求開始?我以爲不是。在嘗試縮短它之前,我會嘗試找到一個更好的算法(不是O(n ** 2))。 –

+0

@gnibbler:快速和骯髒的破解超過600行,緩存處理它,運行<1秒,n.p. :) –

回答

2

你的代碼是非常低效的,因爲你open循環遍歷第一個文件中的第二個文件。只需將第二個文件讀入列表(或更好的方法是set,它可以平均爲O(1)查找時間)並使用in運算符。此外,您linecnt變量只是計數file1中的行數 - 你可以只讀取線到一個列表,這個列表上調用len得到相同號碼:

def compfiles(file1, file2): 
    lines1 = [l.strip() for l in open(file1).read().split("\n")] 
    lines2 = set([l.strip() for l in open(file2).read().split("\n")]) 
    for line in lines1: 
     if not line in lines2: 
      print("Miss: file %s contains '%s', but file %s does not!" % (file1, line, file2)) 
    print("%i lines compared between %s and %s." % (len(lines1), file1, file2)) 
+1

這仍然是O(N ** 2)比較 –

+0

@gnibbler我想file2可以變成一個集,這將使查找'O(1)'... – l4mpi

1
def compfiles(file1, file2): 
    with open(file1) as fin: 
     set1 = set(fin) 
    with open(file2) as fin: 
     set2 = set(fin) 
    ... # do some set operations 

如果文件有重複線或順序很重要,遍歷文件1

def compfiles(file1, file2): 
    with open(file2) as fin: 
     set2 = set(fin) 
    with open(file1) as fin: 
     for i, line in enumerate(fin): 
      if line not in set2: 
       print("Miss: file %s contains '%s', but file %s does not!" % (file1, line1, file2))   
     print("%i lines compared between %s and %s." % (i+1, file1, file2)) 
+0

雖然這將工作,它wouldn如果file1包含重複行,則不會顯示OPs代碼的確切行爲。 – l4mpi

+0

是的。將file2讀入一個集合並遍歷file1可能會更好 –