2010-11-15 22 views
0

我需要搜索其他東西中的第一個,最後一個,所有或全部事件。爲了避免重複我自己(DRY),我提出了以下解決方案。濫用收益以避免循環中的條件

感興趣的是兩種Searcher類別的方法search_revisions()collect_one_occurence()

SearcherYield我創建search_revisions()發電機只能放棄了發電機collect_one_occurence()收集的第一個結果之後。在SearcherCondition我把條件放在循環中。這個條件必須在循環的每一次迭代中檢查。

我不能決定我的(ab)使用產量和隨後放棄發電機是否是天才罷工或可怕的黑客攻擊。你怎麼看?你對這種情況有什麼其他想法嗎?

#!/usr/bin/python 

class Revision: 
    # a revision is something like a textfile. 
    # the search() method will search the textfile 
    # and return the lines which match the given pattern. 
    # for demonstration purposes this class is simplified 
    # to return predefined results 
    def __init__(self, results): 
    self.results = results 
    def search(self, pattern): 
    return self.results 

class AbstractSearcher: 
    def __init__(self, revisions): 
    self.revisions = revisions 
    def search_for_first_occurence(self, pattern): 
    keys = sorted(self.revisions.iterkeys()) 
    return self.collect_one_occurence(keys, pattern) 
    def search_for_last_occurence(self, pattern): 
    keys = sorted(self.revisions.iterkeys(), reverse = True) 
    return self.collect_one_occurence(keys, pattern) 
    def search_for_any_occurence(self, pattern): 
    keys = self.revisions.iterkeys() 
    return self.collect_one_occurence(keys, pattern) 
    def search_for_all_occurences(self, pattern): 
    keys = self.revisions.iterkeys() 
    return self.collect_all_occurences(keys, pattern) 

class SearcherYield(AbstractSearcher): 

    def search_revisions(self, keys, pattern): 
    # create generator which yields the results one by one 
    for key in keys: 
     rev = self.revisions[key] 
     result = rev.search(pattern) 
     if result: 
     yield result 

    def collect_one_occurence(self, keys, pattern): 
    # take the first result and then abandon the generator 
    for result in self.search_revisions(keys, pattern): 
     return result 
    return [] 

    def collect_all_occurences(self, keys, pattern): 
    # collect all results from generator 
    results = [] 
    for result in self.search_revisions(keys, pattern): 
     results.extend(result) 
    return results 

class SearcherCondition(AbstractSearcher): 

    def search_revisions(self, keys, pattern, just_one): 
    # collect either all results from all revisions 
    # or break the loop after first result found 
    results = [] 
    for key in keys: 
     rev = self.revisions[key] 
     result = rev.search(pattern) 
     if result: 
     results.extend(result) 
     if just_one: 
      break 
    return results 

    def collect_one_occurence(self, keys, pattern): 
    return self.search_revisions(keys, pattern, just_one = True) 

    def collect_all_occurences(self, keys, pattern): 
    return self.search_revisions(keys, pattern, just_one = False) 

def demo(searcher): 
    print searcher.__class__.__name__ 
    print 'first:', searcher.search_for_first_occurence('foo') 
    print 'last: ', searcher.search_for_last_occurence('foo') 
    print 'any: ', searcher.search_for_any_occurence('foo') 
    print 'all: ', searcher.search_for_all_occurences('foo') 

def main(): 
    revisions = { 
     1: Revision([]), 
     2: Revision(['a', 'b']), 
     3: Revision(['c']), 
     4: Revision(['d','e', 'f']), 
     5: Revision([])} 
    demo(SearcherYield(revisions)) 
    demo(SearcherCondition(revisions)) 

if __name__ == '__main__': 
    main() 

一些上下文:修訂基本上是文本文件。你可以將它們想象成維基頁面的修訂版本。通常有數百次修訂,有時數千次。每個修訂包含多達數千行文本。也有一些情況,只有少數幾行修訂版本。

在修訂中搜索將在文本中搜索一個模式並返回匹配的行。有時有成千上萬的結果,有時沒有結果。

有時我只需要知道是否有任何修訂結果(搜索任何)。有時我必須收集所有結果以進一步處理(搜索全部)。有時候我只需要第一次修改時使用匹配,有時候只需要最後一次修訂(搜索第一個和最後一個)。

+2

這是waaaaaaay過於複雜。不過,我不能告訴你如何解決它,除非你可以提供更多有用的上下文;我可以從你的例子中得到的是你寫了太多的代碼。你在尋找什麼? – katrielalex 2010-11-15 22:56:09

+0

你需要一個術語移植:你最先/最後調用的是真正的最小/最大鍵,並且(有效地)'排序(可迭代)[0]'而不是'min(可迭代)'有點令人震驚。 – 2010-11-16 00:58:25

+0

@JohnMachin:再次閱讀代碼。該代碼沒有執行'sorted(iterable)[0]'。帶有匹配的第一個修訂不一定是排序列表中的第一個修訂。 – lesmana 2010-11-16 13:14:09

回答

2

我做了一個基準。下面是結果:

$ ./benchmark.py 
benchmark with revcount: 1000 timeitcount: 1000 
last, first, yield: 0.902059793472 
last, first, cond: 0.897155046463 
last, all, yield: 0.818709135056 
last, all, cond: 0.818334102631 
all, all, yield: 1.26602506638 
all, all, cond: 1.17208003998 
benchmark with revcount: 2000 timeitcount: 1000 
last, first, yield: 1.80768609047 
last, first, cond: 1.84234118462 
last, all, yield: 1.64661192894 
last, all, cond: 1.67588806152 
all, all, yield: 2.55621600151 
all, all, cond: 2.37582707405 
benchmark with revcount: 10000 timeitcount: 1000 
last, first, yield: 9.34304785728 
last, first, cond: 9.33725094795 
last, all, yield: 8.4673140049 
last, all, cond: 8.49153590202 
all, all, yield: 12.9636368752 
all, all, cond: 11.780673027 

產量和條件的解決方案表現出非常相似的時間。我認爲這是因爲生成器(yield)具有一個帶有條件的循環(如果不是空的或類似的)。我以爲我避免了循環中的狀況,但我只是將它移出了視線。

無論如何,數字表明性能大都相同,所以應該通過可讀性來判斷代碼。我會堅持循環中的條件。我喜歡明確的。

這裏是基準代碼:

#!/usr/bin/python 

import functools 
import timeit 

class Revision: 
    # a revision is something like a textfile. 
    # the search() method will search the textfile 
    # and return the lines which match the given pattern. 
    # for demonstration purposes this class is simplified 
    # to return predefined results 
    def __init__(self, results): 
    self.results = results 
    def search(self, pattern): 
    return self.results 

class AbstractSearcher: 
    def __init__(self, revisions): 
    self.revisions = revisions 
    def search_for_first_occurence(self, pattern): 
    keys = sorted(self.revisions.iterkeys()) 
    return self.collect_one_occurence(keys, pattern) 
    def search_for_last_occurence(self, pattern): 
    keys = sorted(self.revisions.iterkeys(), reverse = True) 
    return self.collect_one_occurence(keys, pattern) 
    def search_for_any_occurence(self, pattern): 
    keys = self.revisions.iterkeys() 
    return self.collect_one_occurence(keys, pattern) 
    def search_for_all_occurences(self, pattern): 
    keys = self.revisions.iterkeys() 
    return self.collect_all_occurences(keys, pattern) 

class SearcherYield(AbstractSearcher): 

    def search_revisions(self, keys, pattern): 
    # create generator which yields the results one by one 
    for key in keys: 
     rev = self.revisions[key] 
     result = rev.search(pattern) 
     if result: 
     yield result 

    def collect_one_occurence(self, keys, pattern): 
    # take the first result and then abandon the generator 
    for result in self.search_revisions(keys, pattern): 
     return result 
    return [] 

    def collect_all_occurences(self, keys, pattern): 
    # collect all results from generator 
    results = [] 
    for result in self.search_revisions(keys, pattern): 
     results.extend(result) 
    return results 

class SearcherCondition(AbstractSearcher): 

    def search_revisions(self, keys, pattern, just_one): 
    # collect either all results from all revisions 
    # or break the loop after first result found 
    results = [] 
    for key in keys: 
     rev = self.revisions[key] 
     result = rev.search(pattern) 
     if result: 
     results.extend(result) 
     if just_one: 
      break 
    return results 

    def collect_one_occurence(self, keys, pattern): 
    return self.search_revisions(keys, pattern, just_one = True) 

    def collect_all_occurences(self, keys, pattern): 
    return self.search_revisions(keys, pattern, just_one = False) 

def benchmark(revcount, timeitcount): 

    lastrev = {} 
    for i in range(revcount): 
    lastrev[i] = Revision([]) 
    lastrev[revcount] = Revision([1]) 

    allrevs = {} 
    for i in range(revcount): 
    allrevs[i] = Revision([1]) 

    last_yield = SearcherYield(lastrev) 
    last_cond = SearcherCondition(lastrev) 
    all_yield = SearcherYield(allrevs) 
    all_cond = SearcherCondition(allrevs) 

    lfy = functools.partial(last_yield.search_for_first_occurence, 'foo') 
    lfc = functools.partial(last_cond.search_for_first_occurence, 'foo') 
    lay = functools.partial(last_yield.search_for_all_occurences, 'foo') 
    lac = functools.partial(last_cond.search_for_all_occurences, 'foo') 
    aay = functools.partial(all_yield.search_for_all_occurences, 'foo') 
    aac = functools.partial(all_cond.search_for_all_occurences, 'foo') 

    print 'benchmark with revcount: %d timeitcount: %d' % (revcount, timeitcount) 
    print 'last, first, yield:', timeit.timeit(lfy, number = timeitcount) 
    print 'last, first, cond:', timeit.timeit(lfc, number = timeitcount) 
    print 'last, all, yield:', timeit.timeit(lay, number = timeitcount) 
    print 'last, all, cond:', timeit.timeit(lac, number = timeitcount) 
    print ' all, all, yield:', timeit.timeit(aay, number = timeitcount) 
    print ' all, all, cond:', timeit.timeit(aac, number = timeitcount) 

def main(): 
    timeitcount = 1000 
    benchmark(1000, timeitcount) 
    benchmark(2000, timeitcount) 
    benchmark(10000, timeitcount) 

if __name__ == '__main__': 
    main() 

關於我的系統的一些信息:

$ lsb_release -a 
No LSB modules are available. 
Distributor ID: Ubuntu 
Description: Ubuntu 10.04.1 LTS 
Release: 10.04 
Codename: lucid 
$ uname -a 
Linux lesmana-laptop 2.6.32-26-generiC#46-Ubuntu SMP Tue Oct 26 16:46:46 UTC 2010 i686 GNU/Linux 
$ python --version 
Python 2.6.5 
$ cat /proc/cpuinfo | grep name 
model name : Intel(R) Pentium(R) M processor 1.60GHz 
0

這將解決你的問題,如果lookup_item是不可改變的,COLLEC是任何有序集合:

positions = [i for i, item in enumerate(collec) if item==lookup_item]

這實際上將返回那裏lookup_item發生在COLLEC所有位置。

+0

這不是以任何方式解決我的問題。我不主要需要匹配修訂版的索引,而是需要修訂版中的匹配行。此外,搜索不會比較平等,它會搜索修訂行中的模式。此外,當每次有成千上萬行的修改時,我只想要第一次匹配,因此在第一次命中後中止搜索循環會有很大的不同。 – lesmana 2010-11-16 13:21:55

+1

「我需要搜索其他東西中的第一個,最後一個或所有東西。」 - 我認爲你沒有清楚地說明你的問題 – vonPetrushev 2010-11-16 14:14:24

0

我個人是贊成收益率儘可能的可讀性,但它是一個千鈞一髮。除了我認爲它是一個很好的代碼構造並且適用於許多情況之外,我其實沒有太多的理由。

您可能已經知道這一點,但代碼將需要返回給調用者的匹配修訂。修改搜索方法返回結果時,修改代碼最少的方法是將鏈接返回到版本。

您可以通過將python itertools模塊與yield一起使用來減少代碼。可讀性可以說是去廢話,但它是如此赫然怪胎圓滑:

from itertools import chain,repeat,islice,ifilter 
def collect_one_occurence(self, keys, pattern): 
    return chain(ifilter(None,(rev.search(pattern) for rev in (self.revisions[key] for key in keys)),repeat([]).next() 

def collect_all_occurences(self, keys, pattern): 
    return list(chain(*[rev.search(pattern) for rev in (self.revisions[key] for key in keys)])) 

很明顯,你可以展開代碼,使其更具可讀性,但我把它摺疊以基準目的..不知道多少,這改善了當前的結果?