有沒有讓這種邏輯更優雅的Pythonic方法？

我是Python新手，爲了簡單的任務我一直在玩它。我有一堆需要以複雜方式操作的CSV，但爲了學習Python，我將其分解爲更小的任務。有沒有讓這種邏輯更優雅的Pythonic方法？

現在，給定一個字符串列表，我想刪除字符串中任何名字的用戶定義的標題前綴。任何包含名稱的字符串將只包含名稱，有或沒有標題前綴。我有以下幾點，它的工作原理，但它感覺不必要的複雜。是否有更多的Pythonic方法來做到這一點？謝謝！

# Return new list without title prefixes for strings in a list of strings. 
def strip_titles(line, title_prefixes): 
    new_csv_line = [] 
    for item in line: 
     for title_prefix in title_prefixes: 
      if item.startswith(title_prefix): 
       new_csv_line.append(item[len(title_prefix)+1:]) 
       break 
      else: 
       if title_prefix == title_prefixes[len(title_prefixes)-1]: 
        new_csv_line.append(item) 
       else: 
        continue 
    return new_csv_line 

if __name__ == "__main__": 
    test_csv_line = ['Mr. Richard Stallman', 'I like cake', 'Mrs. Margaret Thatcher', 'Jean-Claude Van Damme'] 
    test_prefixes = ['Mr.', 'Ms.', 'Mrs.'] 
    print strip_titles(test_csv_line, test_prefixes)

來源

2010-09-24 bsamek

「Jane Doe女士」和「Betty Bloggs夫人」以及「Fred Nerk先生」，還有很多書呆子[縮寫詞，縮略詞]和按鍵節儉的民謠和女士憎惡的民謠。「和」希爾德加德希格斯小姐「？ – 2010-09-24 02:12:21

@John謝天謝地，這不是一個問題，因爲數據來自另一個來源，併爲此提供了一致的方案。 – bsamek 2010-09-24 02:22:23

「一致的數據源」？我將在「Famous Last Words」下提交該文件:-) – 2010-09-24 03:13:31

[re.sub(r'^(Mr|Ms|Mrs)\.\s+', '', s) for s in test_csv_line]

來源

2010-09-24 02:04:34

哇。很酷。但是，它會在刪除前綴時在名稱前留下一個空格。 – bsamek 2010-09-24 02:07:28

我永遠不會厭倦看到正則表達式的美麗。 – 2010-09-24 02:27:36

@paracaudex：你可能在評論時看到我的第一個版本。當前版本去掉前綴後的所有空格。 – 2010-09-24 02:37:48

假設prefixes是可變的，或許是本地化的一個方面，或者你不喜歡使用正則表達式其他一些原因，你可以做這樣的事情（未測試的代碼）：

def strip_title(string, prefixes): 
    for prefix in prefixes: 
     if string.startswith(prefix + ' '): 
      return string[len(prefix) + 1:] 
    return string 

stripped = (list(strip_title(cell, prefixes) for cell in line) 
      for line in lines)

這不是特別有效，因爲算法最終會執行大量冗餘檢查（例如，如果行以M開頭，則檢查三次）。這種事情是使用正則表達式的一個重要原因。

或者，你可以動態地構建一個正則表達式，以逃避每個前綴和|分支機構加入他們：

def TitleStripper(prefixes): 
    import re 
    escaped_titles = (re.escape(prefix) for prefix in prefixes) 
    prefix_re = re.compile('^({0}) '.format('|'.join(escaped_titles))) 
    def strip_title(string): 
     return prefix_re.sub('', string, 1) 
    return strip_title

功能TitleStripper創建一個閉合功能strip_title工作方式類似於前一個，但是專爲一組特定的前綴。撥打電話strip_title = TitleStripper(prefixes)後，您可以致電strip_title(string)。

主要是由於使用正則表達式，這會比第一種方法快一些，也許會以犧牲清晰度爲代價。

如果你真的只需要檢查三個前綴，這些方法中的任何一個都是矯枉過正的，你應該只使用一個靜態RE，如另一個答案中所解釋的。

來源

2010-09-24 02:16:22 intuited

爲什麼我需要轉義每個前綴？ – bsamek 2010-09-24 02:26:23

例如，您需要轉義'.'，即替代'\ .'，以便它不匹配任何字符。你可以用[re。逸出]（http://docs.python.org/library/re.html#re.escape）。 – intuited 2010-09-24 02:50:04

啊，我明白了。我以爲你的意思是逃避整個事情 - 就像\先生。我沒有意識到有一個逃生功能。 – bsamek 2010-09-24 03:10:34

更多Pythonic方法是用子句替換for item in line:循環的「列表結束」檢查。

# Return new list without title prefixes for strings in a list of strings.  
def strip_titles(line, title_prefixes): 
    new_csv_line = [] 
    for item in line: 
     for title_prefix in title_prefixes: 
      if item.startswith(title_prefix): 
       new_csv_line.append(item[len(title_prefix)+1:]) 
       break 
     else: 
      new_csv_line.append(item) 
    return new_csv_line

邏輯是否則你的一樣：在else如果for循環完成而沒有被中斷被執行。

來源

2010-09-24 02:24:04

有沒有讓這種邏輯更優雅的Pythonic方法？

回答

相關問題