避免循環後重復代碼？

在使用循環時，我經常最終編寫了一段代碼。例如，雖然渡過了Udacity計算機科學課程，我寫的代碼（一個函數來找到最順序重複元素）：避免循環後重復代碼？

def longest_repetition(l): 
    if not l: 
     return None 
    most_reps = count = 0 
    longest = prv = None 
    for i in l: 
     if i == prv: 
      count += 1 
     else: 
      if count > most_reps: 
       longest = prv 
       most_reps = count 
      count = 1 
     prv = i 
    if count > most_reps: 
     longest = prv 
    return longest

在這種情況下，我檢查兩次，如果計數大於比以前最重複的元素。噹噹前元素與最後一個元素不同且當我到達列表的末尾時，會發生這種情況。

當我按字符解析一個字符串時，我也碰到過幾次。還有幾次，它已經達到約5行代碼。這是常見的，還是我認爲/代碼的結果。我該怎麼辦？

編輯：同樣，在一個人爲的字符串分割例如：

def split_by(string, delimeter): 
    rtn = [] 
    tmp = '' 
    for i in string: 
     if i == delimeter: 
      if tmp != '': 
       rtn.append(tmp) 
       tmp = '' 
     else: 
      tmp += i 
    if tmp != '': 
     rtn.append(tmp) 
    return rtn

編輯：這是從預期不具有的Python的任何外部知識，誰是課程的學生寫的考試;只是以前的單位教過的。雖然我確實有過Python的使用經驗，但我試圖遵循這些限制來充分利用課程。像str.split，列表和許多Python基礎知識都被教授，但還沒有進口 - 尤其是不像groupby。也就是說，如果沒有任何可能不會在程序設計入門課程中教授的語言特性，應該如何編寫它。

來源

2012-06-22 mowwwalker

請使用'如果some_string：'檢查'some_string'不爲空 – jfs

'如果不是l'是多餘的。 'l'是個壞名字。 'most_reps'可以被稱爲'max_count'來清除與'count'的關係。 'i' - >'current'，'prv' - >'last' – jfs

既然你標記language-agnostic，我知道你不會是在Python的特有的東西，你可以用它來使你的代碼效率高，結構緊湊，可讀性非常感興趣。出於同樣的原因，我不打算展示如何用python編寫代碼。

在某些情況下，根據您的算法可以避免額外if，但大多數情況下它就像「如果存在，它應該是重要的和/或有效的」。我不知道python解釋器是如何工作的，但是在像C/C++ /等編譯語言中。編譯器執行各種循環優化，包括將if-blocks移出循環（如果它執行相同的操作）。

我跑和比較各種片段的運行時間：

@JFSebastian - 8.9939801693
@srgerg - 3.13302302361
你 - 2.8182990551。

尾隨的if爲您提供最佳時間，這不是一個普遍現象。我的觀點是：只要按照你的算法，並嘗試優化它。最後if沒有什麼問題。可能替代解決方案是昂貴的。

關於你提出的第二個例子：該檢查tmp == ''做是爲了確保只返回非空字符串。這實際上是分裂算法的一種附加條件。無論如何，循環後您需要額外的rtn.append，因爲還有一些超出了最後一個分隔符。你總是可以再次按下的狀態，如果像if curCharIndex == lastIndex: push items to list循環將在每次迭代執行，它的排序相同的情況下內。

我的答案很簡單：

您的代碼是你的算法，你心裏有一樣有效。
if最終遇到了很多情況 - 不需要擔心它們，它們可能會使代碼比沒有這種if的替代方法更有效率（例子就在這裏）。
另外編譯器還可以發現和修改/左右移動你的代碼塊。
如果有一個語言特性/庫，使你的代碼快速，並在同一時間讀取，使用它。（其他的答案在這裏指出什麼蟒蛇報價:)）

來源

2012-06-22 05:04:45 SuperSaiyan

+1點製作精良 – xvatar

優化的第一條規則：**不要**。第二個 - *還沒有*。 [讓它工作，把它做好，讓它快速]（http://c2.com/cgi/wiki?MakeItWorkMakeItRightMakeItFast）。 – jfs

在循環內部檢查循環結束時，需要重新檢查條件並不少見。如果您準備犧牲一點效率，避免重複檢查的一種方法是在循環內過度檢查它。例如：

def my_longest_repetition(l): 
    if not l: 
     return None 
    most_reps = count = 0 
    longest = prv = None 
    for i in l: 
     count = (count + 1) if i == prv else 1 
     if count > most_reps: 
      longest = prv 
      most_reps = count 
     prv = i 
    return longest

此代碼檢查count > most_reps往往比實際需要的，但避免了需要循環後再次檢查。

不幸的是，這種改變並不適用於所有情況。

來源

2012-06-22 04:26:37 srgerg

額外的檢查是爲什麼我寫它的其他方式，但。我認爲在循環結束時再進行一次比較比常量比較便宜。 – mowwwalker

看看itertools.groupby的執行情況，它幾乎完全符合你的要求。 http://docs.python.org/library/itertools.html#itertools.groupby

下面是使用算法所述代碼：

from itertools import groupby 

string = "AAABBCCDDDD" 

maximum = 0 
max_char = "" 

for i in groupby(string): 
    x, xs = i 
    n = len(list(xs)) 
    if n > maximum: 
     max_char = x 
     maximum = n 

print max_char

我的思考在未來寫算法，這樣的建議是，儘量不要在一個函數所做的一切。考慮解決您嘗試解決的問題的小函數，例如「將序列中的每個相等項目的序列分組爲較小的序列」。

當然，它不一定是上述算法中的字符 - 它可以是任何可分組的東西。

編輯：爲響應OP的編輯，我想你不會被允許使用/知道類似於在一個類中設置itertools庫，但我不建議你應該依靠外部庫，但更你應該把問題分解成更小的子問題來思考問題。所以在這種情況下，你會實現你自己的groupby並使用它。

來源

2012-06-22 04:29:17 Wes

[可以簡化]（http://stackoverflow.com/a/11150325/4279） – jfs

我曾考慮編寫列表理解/生成器表達式版本，但後來我認爲這對他來說太過撲朔迷離。 – Wes

我的代碼中唯一的genexpr是用'sum（1 for _in xs）'來代替你的'len（list（xs））'（它消耗無內存的內存）'。最難理解的是'groupby（）'比較起來，其他的都很簡單。 – jfs

避免在循環後重復條件的語言無關技術將sentinel值附加到輸入數據，例如，如果附加到string結束delimiter則條件不在split_by()必要的。典型示例：在線性搜索算法中，針可以附加到乾草堆上以避免序列檢查的結束。

另一種選擇是委託一些工作例如一個單獨的函數，一個函數計算重複的次數，另外發現最大爲longest_repetition()：

from itertools import groupby 

def longest_repetition(iterable): 
    return max(groupby(iterable), key=lambda x: sum(1 for _ in x[1]))[0]

如果重複的代碼是微不足道;這可能是不值得的。

來源

2012-06-22 04:46:19 jfs

我認爲有，可以幫助您避免在循環結束重複碼三種一般方法。對於所有三個，我將使用一個與你自己稍有不同的示例問題，計算字符串中的單詞。這裏有一個「默認」的版本，像你的代碼，重複一些邏輯，在循環的末尾：

from collections import Counter 

def countWords0(text): 
    counts = Counter() 
    word = "" 

    for c in text.lower(): 
     if c not in "abcdefghijklmnopqrstuvwxyz'-": 
      if word: 
       counts[word] += 1 
      word = "" 
     else: 
      word += c 

    if word: 
     counts[word] += 1 # repeated code at end of loop 

    return counts

第一種方法是每一個字符後做（一些）的「序列結束」處理，如果序列在該字符之後立即結束，那麼簿記是正確的。在你的例子中，你可以消除你的「其他」條件，並且每次運行代碼。（這是sergerg的答案。）

這可能並不容易一些各式各樣的檢查，但。爲了統計單詞，您需要添加一些額外的邏輯，以避免從您處理的「部分」子序列中累積雜音。這裏的代碼，不會說：

def countWords1(text): 
    counts = Counter() 
    word = "" 

    for c in text.lower(): 
     if c not in "abcdefghijklmnopqrstuvwxyz'-": 
      word = "" 
     else: 
      if word: 
       counts[word] -= 1 # new extra logic 
      word += c 
      counts[word] += 1 # this line was moved from above 

    return counts + Counter() # more new stuff, to remove crufty zero-count items

第二個選擇是追加的警戒值，這將觸發行爲的期望「的序列結束」序列的末端。如果您需要避免哨兵污染您的數據（尤其是數字等），這可能會非常棘手。對於最長的連續子序列問題，可以添加任何不等於序列中最後一項的值。 None可能是一個不錯的選擇。對於我的話計數例如，非文字字符（如換行符）會做：

def countWords2(text): 
    counts = Counter() 
    word = "" 

    for c in text.lower() + "\n": # NOTE: added a sentinel to the string! 
     if c not in "abcdefghijklmnopqrstuvwxyz'-": 
      if word: 
       counts[word] += 1 
      word = "" 
     else: 
      word += c 

    # no need to recheck at the end, since we know we ended with a space 

    return counts

第三種方法是更改代碼的結構，以避免循環訪問可能會意外結束的序列。您可以使用發電機來預處理序列，如在使用groupby從itertools其他的答案。（當然，發電機的功能，如果你有他們自己寫的，可能有類似的問題。）

對於我的單詞統計例子，我可以使用正則表達式從re模塊找到的話：

from re import finditer 

def countWords3(text): 
    return Counter(match.group() for match in 
        finditer("[\w'-]+", text.lower()))

輸出，給予適當Python的文本時（這是所有四個版本countWords相同）：

>>> text = """Well, there's egg and bacon; egg sausage and bacon; 
       egg and spam; egg bacon and spam; egg bacon sausage and spam; 
       spam bacon sausage and spam; spam egg spam spam bacon and spam; 
       spam sausage spam spam bacon spam tomato and spam; 
       spam spam spam egg and spam; spam spam spam spam spam spam 
       baked beans spam spam spam; or Lobster Thermidor a Crevette 
       with a mornay sauce served in a Provencale manner with shallots 
       and aubergines garnished with truffle pate, brandy and with a 
       fried egg on top and spam.""" 

>>> countWords0(text) 
Counter({'spam': 28, 'and': 12, 'egg': 8, 'bacon': 7, 'sausage': 4, 'a': 4, 
     'with': 4, 'well': 1, 'lobster': 1, 'manner': 1, 'in': 1, 'top': 1, 
     'thermidor': 1, "there's": 1, 'truffle': 1, 'provencale': 1, 
     'sauce': 1, 'brandy': 1, 'pate': 1, 'shallots': 1, 'garnished': 1, 
     'tomato': 1, 'on': 1, 'baked': 1, 'aubergines': 1, 'mornay': 1, 
     'beans': 1, 'served': 1, 'fried': 1, 'crevette': 1, 'or': 1})

來源

2012-06-22 06:23:52 Blckknght

爲了更好地[分離關注點]（http://en.wikipedia.org/wiki/Separation_of_concerns）（如果代碼是針對初學者的話，即使（或尤其）也應該存在）計數代碼和文本分割代碼應與上一個例子分開（'Counter'計數，genexpr生成單詞）。 – jfs

迭代器提供了一個很好的方式來打破循環：

def longest_repetition(l): 
    i=iter(l) 
    n=next(i,None) 
    longest=None 
    most_reps=0 
    while n is not None: 
    p=n 
    count=0 
    while p==n: 
     n=next(i,None) 
     count+=1 
    if count>most_reps: 
     most_reps=count 
     longest=p 
    return longest

許多語言有類似的概念。

來源

2012-06-22 07:20:47

避免循環後重復代碼？

回答

相關問題