2013-08-25 28 views
-3

我正在使用正則表達式清理名稱的列表,以便它們是正常的。比方說,這個名單是...使用正則表達式'清理'名稱列表

000000AAAAAARob Alsod  ## Notice multiple 0's and A's? 
AAAPerson Person   ## Here, too 
Jeff the awesome Guy  ## Four words... 
Jenna DEeath    ## A name like this can exist. 
GEOFFERY EVERDEEN   ## All caps 
shy guy     ## All lowercase 
Theone Normalperson  ## Example name. This one is fine. 
    Guywith Whitespace  ## Trailing or leading whitespace is a nono. 

所以,你可以看到,人們不正確的格式化自己的名字,所以我需要一個程序來突出顯示所有不必要的東西。這包括:

  • 號這個名字的開始

  • 任何大寫後沒有小寫。即AAAAAAAJosh

  • 什麼都是全部大寫。

  • 任何不是開始與大寫。即josh

  • 尾隨和領先的空白。

我認爲這就是我需要過濾掉的所有東西。最終產品應該看起來像這樣:

Rob Alsod    ## No more 0's and A's. 
Person Person   ## No more leading A's (or other letters). 
Jeff Guy    ## No lowercase words in his name. 
Jenna DEeath   ## HASN'T removed the D in the middle. 
         ## Name removed as it was all uppercase. 
         ## Name removed as it was all lowercase. 
Theone Normalperson ## Nothing changed. 
Guywith Whitespace  ## Removed whitespace. 

編輯:對不起。這裏是我目前的代碼:

# Enter your code for "Name Cleaning" here. 
import re 
namenum = [] 
num = 0 
for sen in open('file.txt'): 
    namenum += [sen.split(',')] 
    namenum[num][0] = re.sub(r'\s[a-z]+', '', namenum[num][0]) 
    namenum[num][0] = re.sub(r'^([0-9]*)', '', namenum[num][0]) 
    namenum[num][0] = re.sub(r'^[A-Z]*?\s[A-Z]*?$', '', namenum[num][0]) 
    namenum[num][0] = re.sub(r'[^a-zA-Z ][A-Z]*(?=[A-Z])', '', namenum[num][0]) 
    namenum[num][0] = re.sub(r'\b[a-z]+\b', '', namenum[num][0]) 
    namenum[num][0] = re.sub(r'^\s*', '', namenum[num][0]) 
    namenum[num][0] = re.sub(r'\s*$', '', namenum[num][0]) 
    if namenum[num][0] == '': 
    namenum[num][0] = 'Invalid Name' 
    num += 1 
for i in range(len(namenum)): 
    namenum[i][1] = int(namenum[i][1].strip()) 
namenum = sorted(namenum, key=lambda item: (-item[1], item[0])) 
for i in range(0, len(namenum)): 
    print(namenum[i][0]+','+str(namenum[i][1])) 

它做了一半的工作,但它由於某種原因錯過了某些東西。

這裏是輸出:

AAAAAARob Alsod 
AAAPerson Person 
Guywith Whitespace 
Invalid Name 
Invalid Name 
Jeff Guy 
Jenna DEeath 
Theone Normalperson 

我也試着輸入一個名稱,如harry hamilton,它給了回來harry,它應該已經刪除。

+1

你確實有*嘗試*的東西。你的代碼到目前爲止在哪裏?我們不是免費的代碼工廠。 -1,closevoted – Doorknob

+0

對不起。我編輯了OP。 –

+0

它錯過了什麼東西? – Michelle

回答

1

這個正則表達式會刪除所有無效的例子。您的任何示例都不需要for循環過濾被禁止的單詞,但我認爲您需要它。

儘管此代碼從列表中刪除所有無效名稱,但應該很容易對其進行修改以請求來自用戶的新輸入。此外,它不會讓您知道名稱無效的原因,但您可以只顯示所有規則。

from re import match 

def rules(name): 
    for badWord in bannedWords: 
     if name.lower().find(badWord) >= 0: 
      return False   
    return match(r'^([A-Z][a-z]+(?:[A-Z]?[a-z]+)* ?){1,}$', name) 

bannedWords = ('really', 'awesome') 
input = ['000000AAAAAARob Alsod', 'AAAPerson Person', 'Jeff the awesome Guy', 'Jenna DEeath', 'GEOFFERY EVERDEEN', 'shy guy', 'Theone Normalperson', ' Guywith Whitespace', 'Someone Middlename MacIntyre', '', 'Jack Really Awesome'] 
results = filter(rules, input) 
print results 

產生的結果爲:

['Theone Normalperson', 'Someone Middlename MacIntyre'] 

沒有for循環:

from re import match 

def rules(name):  
    return match(r'^([A-Z][a-z]+(?:[A-Z]?[a-z]+)* ?){1,}$', name) 

input = ['000000AAAAAARob Alsod', 'AAAPerson Person', 'Jeff the awesome Guy', 'Jenna DEeath', 'GEOFFERY EVERDEEN', 'shy guy', 'Theone Normalperson', ' Guywith Whitespace', 'Someone Middlename MacIntyre', '', 'Jack Really Awesome'] 
results = filter(rules, input) 
print results 

產生的結果爲:

['Theone Normalperson', 'Someone Middlename MacIntyre', 'Jack Really Awesome'] 
+0

非常感謝。這應該工作。 –