縮短Python代碼

-4

我有一種感覺，這個Python代碼可以大大縮短，但我幾乎總是傾向於回落到編寫C風格的佈局。在你看來，縮短它的最好方法是什麼？可讀性是獎金，而不是要求。縮短Python代碼

def compfiles(file1, file2): 
    linecnt = 0 
    for line1 in open(file1): 
     line1 = line1.strip() 
     hit = False 
     for line2 in open(file2): 
      line2 = line2.strip() 
      if line2 == line1: 
       hit = True 
       break 
     if not hit: 
      print("Miss: file %s contains '%s', but file %s does not!" % (file1, line1, file2)) 
     linecnt += 1 
    print("%i lines compared between %s and %s." % (linecnt, file1, file2)) 

fn = ["file1.txt", "file2.txt"] 
compfiles(fn[0], fn[1]) 
compfiles(fn[1], fn[0])

來源

2013-06-19 Jonas Byström

如果代碼工作，正確的地方要問如何提高它在http://codereview.stackexchange.com/ –

你真的會打開相同的文件在C中一遍又一遍，沒有關閉它，或只是尋求開始？我以爲不是。在嘗試縮短它之前，我會嘗試找到一個更好的算法（不是O（n ** 2））。 –

@gnibbler：快速和骯髒的破解超過600行，緩存處理它，運行<1秒，n.p. :) –

你的代碼是非常低效的，因爲你open循環遍歷第一個文件中的第二個文件。只需將第二個文件讀入列表（或更好的方法是set，它可以平均爲O(1)查找時間）並使用in運算符。此外，您linecnt變量只是計數file1中的行數 - 你可以只讀取線到一個列表，這個列表上調用len得到相同號碼：

def compfiles(file1, file2): 
    lines1 = [l.strip() for l in open(file1).read().split("\n")] 
    lines2 = set([l.strip() for l in open(file2).read().split("\n")]) 
    for line in lines1: 
     if not line in lines2: 
      print("Miss: file %s contains '%s', but file %s does not!" % (file1, line, file2)) 
    print("%i lines compared between %s and %s." % (len(lines1), file1, file2))

來源

2013-06-19 12:39:24 l4mpi

這仍然是O（N ** 2）比較 –

@gnibbler我想file2可以變成一個集，這將使查找'O（1）'... – l4mpi

def compfiles(file1, file2): 
    with open(file1) as fin: 
     set1 = set(fin) 
    with open(file2) as fin: 
     set2 = set(fin) 
    ... # do some set operations

如果文件有重複線或順序很重要，遍歷文件1

def compfiles(file1, file2): 
    with open(file2) as fin: 
     set2 = set(fin) 
    with open(file1) as fin: 
     for i, line in enumerate(fin): 
      if line not in set2: 
       print("Miss: file %s contains '%s', but file %s does not!" % (file1, line1, file2))   
     print("%i lines compared between %s and %s." % (i+1, file1, file2))

來源

2013-06-19 12:42:31

雖然這將工作，它wouldn如果file1包含重複行，則不會顯示OPs代碼的確切行爲。 – l4mpi

是的。將file2讀入一個集合並遍歷file1可能會更好 –

縮短Python代碼

回答

相關問題