使用設置差異，以獲得缺失值

我有兩個列表，我使用下面的函數來分配（類似於在Unix NL）行號的行號：使用設置差異，以獲得缺失值

def nl(inFile): 
    numberedLines = [] 
    for line in fileinput.input(inFile): 
     numberedLines.append(str(fileinput.lineno()) + ': ' + line) 
    numberWidth = int(log10(fileinput.lineno())) + 1 
    for i, line in enumerate(numberedLines): 
     num, rest = line.split(':',1) 
     fnum = str(num).rjust(numberWidth) 
     numberedLines[i] = ':'.join([fnum, rest]) 
    return ''.join(numberedLines)

這retuns列出，如：1: 12 142: 20 493: 21 28 。隨着我使用的infile，行號非常重要。我的第二個列表結構相同，但行號沒有任何意義。我需要從第二個文件中找到列表差異，並從第一個文件返回行號。例如：如果第二個文件有：5: 12 1448: 20 49我只想返回3，這是第一個列表中缺失值的行數。

這裏是我試過：

oldtxt = 'master_list.txt' # Line numbers are significant 
newFile = 'list2compare.txt' # Line numbers don't matter 

s = set(nl(oldtxt)) 
diff = [x for x in (newFile) if x not in s] 
print diff

回報：[12 14\n', '20 49\n', '21 28\n'] - 顯然不是我所需要的。有任何想法嗎？

來源

2012-09-27 KennyC

如何如下：

f1 = """\ 
12 14 
20 49 
21 28 
""" 

f2 = """\ 
12 14 
20 49 
""" 

def parse(lines): 
    "Take a list of lines, turn into a dict of line number => value pairs" 
    return dict((i + 1, v) for i, v in enumerate(l for l in lines if l)) 

def diff(a, b): 
    """ 
    Given two dicts from parse(), remove go through each linenno => value in a and 
    if the value is in b's values, discard it; finally, return the remaining 
    lineno => value pairs 
    """ 
    bvals = frozenset(b.values()) 
    return dict((ak, av) for ak, av in a.items() if av not in bvals) 

def fmt(d): 
    "Turn linno => value pairs into ' lineno: value' strings" 
    nw = len(str(max(d.keys()))) 
    return ["{0:>{1}}: {2}".format(k, nw, v) for k, v in d.items()] 

d1 = parse(f1.splitlines()) 
print d1 
print 
d2 = parse(f2.splitlines()) 
print d2 
print 
d = diff(d1, d2) 
print d 
print 
print "\n".join(fmt(d))

這使我的輸出：

{1: '12 14', 2: '20 49', 3: '21 28'} 

{1: '12 14', 2: '20 49'} 

{3: '21 28'} 

3: 21 28

來源

2012-09-27 15:58:24 spiralx

謝謝你。從你的想法我回來的東西看起來像'1：1：0 2'這是顯示主文件中的所有行，但沒有顯示任何差異？所以'1：'從主列表和'1：'從比較列表，然後實際數字 – KennyC

我已經在每個階段添加了評論和輸出，這是否有幫助。哦，並且我修正了fmt（）函數中的一個錯誤，它會使格式變得糟糕。 – spiralx

我將在此拍攝刺;）這聽起來像你的行號後主文件所在行的內容也在比較文件中。這是你在追求什麼？在這種情況下，我建議......

主文件內容...

1 2 3 4 
test 
6 7 8 9 
compare 
me

比較文件內容...

6 7 8 9 
10 11 12 13 
me

代碼：

master_file = open('file path').read() 
compare_file = open('file path').read() 

lines_master = master_file.splitlines() 
lines_compare = compare_file.splitlines() 
same_lines = [] 
for i,line in enumerate(lines_master): 
    if line in lines_compare: 
     same_lines.append(i+1) 

print same_lines

結果是[ 3,5]

來源

2012-09-27 16:04:44 b10hazard

@radio謝謝你。在使用你的方法時，我會返回一個空列表，儘管它們當然是匹配的。問題可能是這兩個文件永遠不會有相同的行號....只是在各種行號 – KennyC

你可以我們e difflib for：ttis：

>>> f1 = """1 2 3 4 
... test 
... 6 7 8 9 
... compare 
... me 
... """ 
>>> 
>>> f2 = """6 7 8 9 
... 10 11 12 13 
... me 
... """ 
>>> 
>>> import difflib 
>>> for line in difflib.ndiff(f1.splitlines(), f2.splitlines()): 
... if line.startswith('-'): 
...  print "Second file is missing line: '%s'" % line 
... if line.startswith('+'): 
...  print "Second file contains additional line: '%s'" % line 
... 
Second file is missing line: '- 1 2 3 4' 
Second file is missing line: '- test' 
Second file is missing line: '- compare' 
Second file contains additional line: '+ 10 11 12 13'

來源

2012-09-27 16:13:59 jterrace

謝謝你相同的文字。不幸的是，我相信這個例子實際上是檢查文件號碼的完整性，因爲粗略的表述顯示它處理的是與'f2 = 8：1 10'不同的'f1 = 1：110'，我需要忽略這樣一個事實，即行號不同意。 – KennyC

是的，你必須首先去掉行號 – jterrace

使用設置差異，以獲得缺失值

回答

相關問題