2013-06-03 37 views
1

我有兩個文件看起來像這樣與他們之間的一些差異:如果存在差異,則獲取具有相同ID的項目的最小最大值?

第一個文件:

{16:[3, [-7, 87, 20, 32]]} 
{17:[2, [-3, 88, 16, 28], 3, [-6, 84, 20, 32]]} 
{18:[2, [-1, 88, 16, 28], 3, [-3, 84, 20, 32]]} 
{19:[2, [1, 89, 16, 28], 3, [-2, 85, 20, 32]]} 
{20:[2, [9, 94, 16, 28], 3, [1, 85, 20, 32]]} 
{21:[2, [12, 96, 16, 28], 3, [2, 76, 19, 31]]} 
{22:[2, [15, 97, 16, 28], 3, [4, 73, 19, 29]]} 
{23:[2, [18, 96, 16, 28], 3, [6, 71, 19, 29], 10, [-10, 60, 51, 82]]} 
{24:[2, [22, 97, 16, 28], 3, [9, 71, 19, 27], 10, [-5, 63, 49, 78]]} 
{25:[2, [25, 99, 16, 28], 3, [13, 71, 17, 26], 10, [-1, 64, 46, 77]]} 
{26:[2, [29, 101, 16, 28], 3, [17, 70, 16, 25], 10, [-1, 65, 45, 77]]} 

第二個文件:

{16:[3, [-7, 86, 20, 32]]} 
{17:[2, [-3, 82, 16, 28], 3, [-6, 84, 20, 32]]} 
{18:[2, [-1, 88, 16, 27], 3, [-3, 84, 20, 32]]} 
{19:[2, [1, 89, 16, 28], 3, [-2, 84, 20, 32]]} 
{20:[2, [9, 94, 15, 28], 3, [1, 85, 20, 32]]} 
{21:[2, [12, 96, 16, 28], 3, [1, 76, 19, 31]]} 
{22:[2, [15, 97, 17, 28], 3, [4, 73, 19, 29]]} 
{23:[2, [18, 96, 18, 28], 3, [6, 71, 19, 29], 10, [-10, 60, 51, 82]]} 
{24:[2, [22, 97, 16, 28], 3, [9, 71, 20, 27], 10, [-5, 63, 49, 78]]} 
{25:[2, [25, 99, 16, 28], 3, [13, 71, 17, 26], 10, [-1, 64, 46, 77]]} 
{26:[2, [29, 101, 17, 28], 3, [17, 70, 16, 25], 10, [-1, 65, 45, 77]]} 

我比較它們使用difflib兩者並打印出在它們之間有差異的線條。 我想要做的是打印共享相同的id的最小值和最大值frame值。

該框架是每一行中的關鍵,因此這種情況下的框架範圍從1626。 id是每個4值列表前面的值。所以第一行的id是3。第二行有兩個ID,分別是2,然後是3

所以想什麼,我寫出來的一個例子是:

17 - 36 

因爲共享ID 3frames的一個比我與比較文件不同。

對於每一個這樣的差異,我需要寫出一個新的文件,只包含開始幀和結束幀,然後我會連接到每個文件的附加字符串。

這是當前difflib使用打印出具有不同的每行:

def compare(f1, f2): 
    with open(f1+'.txt', 'r') as fin1, open(f2+'.txt', 'r') as fin2: 
     diff = difflib.ndiff(fin1.readlines(), fin2.readlines()) 
     outcome = ''.join(x[2:] for x in diff if x.startswith('- ')) 
     print outcome 

我怎麼會能夠達到什麼樣的調整與此執行塊上述我?

請注意,這兩個文件共享相同的frame ammount但不是相同的id s,所以我需要爲每個差異編寫兩個不同的文件,可能到一個文件夾。所以如果這兩個文件有20個不同,我需要爲每個原始文件分別創建兩個主文件夾,每個文件夾包含相同ID的每個開始和結束frame的文本文件。

回答

1

假設您的差異列表是您在帖子開始處給出的文件內容。我接着在2倍,每個ID幀的第一個獲取列表:

>>> from collections import defaultdict 
>>> diffs = defaultdict(list) 
>>> for line in s.split('\n'): 
    d = eval(line) # We have a dict 
    for k in d: # Only one value, k is the frame 
     # Only get even values for ids 
     for i in range(0, len(d[k]), 2): 
      diffs[d[k][i]].append(k) 


>>> diffs # We now have a dict with ids as keys : 
defaultdict(<type 'list'>, {10: [23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36], 2: [17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33], 3: [16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36], 29: [31, 32, 33, 34, 35, 36]}) 

現在,我們每個ID的範圍,這要歸功於this other SO post這有助於從索引列表獲取範圍:

>>> from operator import itemgetter 
>>> from itertools import groupby 
>>> for id_ in diffs: 
    diffs[id_].sort() 
    for k, g in groupby(enumerate(diffs[id_]), lambda (i, x): i - x): 
     group = map(itemgetter(1), g) 
     print 'id {0} : {1} -> {2}'.format(id_, group[0], group[-1]) 


id 10 : 23 -> 36 
id 2 : 17 -> 33 
id 3 : 16 -> 36 
id 29 : 31 -> 36 

然後,對於每個ID,您都有差異的範圍。我想通過一點改編,你可以得到你想要的東西。

編輯:這裏是與同類塊的最終答案:

>>> def compare(f1, f2): 
    # 2 embedded 'with' because I'm on Python 2.5 :-) 
    with open(f1+'.txt', 'r') as fin1: 
     with open(f2+'.txt', 'r') as fin2: 
      lines1 = fin1.readlines() 
      lines2 = fin2.readlines() 
        # Do not forget the strip function to remove unnecessary '\n' 
      diff_lines = [l.strip() for l in lines1 if l not in lines2] 
        # Ok, we have our differences (very basic) 
      diffs = defaultdict(list) 
      for line in diff_lines: 
       d = eval(line) # We have a dict 
       for k in d: 
        list_ids = d[k] # Only one value, k is the frame 
        for i in range(0, len(d[k]), 2): 
         diffs[d[k][i]].append(k) 
      for id_ in diffs: 
       diffs[id_].sort() 
       for k, g in groupby(enumerate(diffs[id_]), lambda (i, x): i - x): 
        group = map(itemgetter(1), g) 
        print 'id {0} : {1} -> {2}'.format(id_, group[0], group[-1]) 

>>> compare(r'E:\CFM\Dev\Python\test\f1', r'E:\CFM\Dev\Python\test\f2') 
id 2 : 17 -> 24 
id 2 : 26 -> 26 
id 3 : 16 -> 24 
id 3 : 26 -> 26 
id 10 : 23 -> 24 
id 10 : 26 -> 26 
+0

這是非常有幫助非常感謝你!但是,我不明白s的第三行是什麼。split('\ n'): – MaxPower

+0

哦,是的,抱歉:-)'s'只是包含您在開頭顯示的所有行的大字符串,它代表了差異列表。 – Emmanuel

+1

我的編輯顯示了一個完整的解決方案,希望它是你所需要的... – Emmanuel

相關問題