2015-07-20 74 views
2

我需要對類對象進行多重比較。然而,只有選定字段的值都受到比較,即:類對象的選擇性比較

class Class: 
    def __init__(self, value1, value2, value3, dummy_value): 
     self.field1 = value1 
     self.field2 = value2 
     self.field3 = value3 
     self.irrelevant_field = dummy_value 

obj1 = Class(1, 2, 3, 'a') 
obj2 = Class(1, 2, 3, 'b') #compare(obj1, obj2) = True 
obj3 = Class(1, 2, 4, 'a') #compare(obj1, obj3) = False 

目前我做這種方式:

def dumm_compare(obj1, obj2): 
    if obj1.field1 != obj2.field1: 
     return False 
    if obj1.field2 != obj2.field2: 
     return False 
    if obj1.field3 != obj2.field3: 
     return False 
    return True 

至於我的實際相關領域的數大於10,這種方法會導致到相當龐大的代碼。這就是爲什麼我嘗試這樣的事情:

def cute_compare(obj1, obj2): 
    for field in filter(lambda x: x.startswith('field'), dir(obj1)): 
     if getattr(obj1, field) != getattr(obj2, field): 
      return False 
    return True 

該代碼是緊湊的;然而,性能遭受重大損失:

import time 

starttime = time.time() 
for i in range(100000): 
    dumm_compare(obj1, obj2) 
print('Dumm compare runtime: {:.3f} s'.format(time.time() - starttime)) 

starttime = time.time() 
for i in range(100000): 
    cute_compare(obj1, obj2) 
print('Cute compare runtime: {:.3f} s'.format(time.time() - start time)) 

#Dumm compare runtime: 0.046 s 
#Cute compare runtime: 1.603 s 

是否有辦法更有效地實現選擇性對象比較?其實我需要幾個這樣的函數(它們通過不同的,有時重疊的字段集來比較對象)。這就是爲什麼我不想覆蓋內置的類方法。

+1

您是否事先知道有多少個田地? –

+1

明確應該*應與*進行比較的字段比較快速,例如使用類屬性COMPARE_FIELDS = ['field1','field2',...]',然後遍歷它。 – jonrsharpe

回答

1

如果在一個特定的比較組的所有實例存在的領域, 嘗試保存列表以與課程進行比較。

def prepped_compare(obj1, obj2): 
    li_field = getattr(obj1, "li_field", None) 
    if li_field is None: 
     #grab the list from the compare object, but this assumes a 
     #fixed fieldlist per run. 
     #mind you getattr(obj,non-existentfield) blows up anyway 
     #so y'all making that assumption already 
     li_field = [f for f in vars(obj1) if f.startswith('field')] 
     obj1.__class__.li_field = li_field 

    for field in li_field: 
     if getattr(obj1, field) != getattr(obj2, field): 
      return False 
    return True  

或預先計算外,更好

def prepped_compare2(obj1, obj2, li_field): 

    for field in li_field: 
     if getattr(obj1, field) != getattr(obj2, field): 
      return False 
    return True  


starttime = time.time() 
li_field = [f for f in vars(obj1) if f.startswith('field')] 
for i in range(100000): 
    prepped_compare2(obj1, obj2, li_field) 
print('prepped2 compare runtime: {:.3f} s'.format(time.time() - starttime)) 

輸出:

Dumm compare runtime: 0.051 s 
Cute compare runtime: 0.762 s 
prepped compare runtime: 0.122 s 
prepped2 compare runtime: 0.093 s 

重。覆蓋eq,我很肯定你可以有類似的東西。

def mycomp01(self, obj2) #possibly with a saved field list01 on the class 
def mycomp02(self, obj2) #possibly with a saved field list02 on the class 

#let's do comp01. 
Class.__eq__ = mycomp01 
run comp01 tests 
Class.__eq__ = mycomp02 
run comp02 tests 
1

dir()不僅包含實例屬性,還會遍歷類層次結構。因此它在這裏所做的工作要多得多; dir()實際上只適用於調試任務。

棒使用vars()代替,或許與any()組合:

def faster_compare(obj1, obj2): 
    obj2_vars = vars(obj2) 
    return all(value == obj2_vars[field] 
       for field, value in vars(obj1).items() if field.startswith('field')) 

vars()返回包含僅實例的屬性的字典;在上面的生成器表達式中,我通過使用dict.items()方法在一個步驟中訪問屬性名稱和它的值。

我將getattr()方法調用替換爲obj2以使用相同的字典方法,每次都可以節省一次framestack推送和彈出操作,因爲完全可以在字節碼(C代碼)中處理密鑰查找。請注意,這確實假定你沒有使用屬性;只會列出實際的實例屬性。

這種方法仍然需要做更多的工作比硬編碼if分支,但它至少是不執行所有的壞:

>>> from timeit import timeit 
>>> timeit('compare(obj1, obj2)', 'from __main__ import obj1, obj2, dumm_compare as compare') 
0.349234500026796 
>>> timeit('compare(obj1, obj2)', 'from __main__ import obj1, obj2, cute_compare as compare') 
16.48695448896615 
>>> timeit('compare(obj1, obj2)', 'from __main__ import obj1, obj2, faster_compare as compare') 
1.9555692840367556 
+0

不應該是'不返回任何值(value!= obj2_vars [field] ...'? – overactor

+0

@overactor:oops,反轉行程在那裏,不應該使用'all()'。 –