2016-11-25 30 views
0

我在Python的工作文字,並有缺字的幾種情況,如像:添加缺少的人物句子

test_list = ['people can t believe','we couldn t be happier','let s not forget'] 

在test_list所有撇號丟失,我寫了一個函數再次添加:

def add_apostrophe(sentense): 
    words = sentense.split() 
    fixed_s = [] 
    flag = False 
    buffer_ = '' 
    for w in reversed(words): 
     if flag: 
      fixed_s.append(''.join([w,buffer_])) 
      flag = False 
      buffer_ = '' 
     elif w in ['t','s']: 
      flag = True 
      buffer_ = "'{}".format(w) 
     else: 
      fixed_s.append(w) 
    fixed_s = ' '.join(reversed(fixed_s)) 
    return fixed_s 

這類工程:

[add_apostrophe(s) for s in test_list] 

["people can't believe", "we couldn't be happier", "let's not forget"] 

但我認爲這可能會破壞句子一些CA對於我來說,我還沒有對它做過詳盡的測試。 此外,這似乎是一個常見問題,是一些圖書館來恢復丟失撇號和一些其他字符?

+1

選中此鏈接。也許你可以從這裏得到一些線索:https://www.hackerrank.com/contests/nov13/challenges/punctuation-corrector-its/leaderboard – MYGz

回答

2

你可以用正則表達式來做。但這可能不是詳盡的報道。

import re 
test_list = ['people can t believe','we couldn t be happier','let s not forget'] 
print [re.sub(r"(\s?)([a-zA-Z]+)\s([a-zA-Z]{1})\s",r"\1\2'\3 ", a) for a in test_list] 

輸出:

["people can't believe", "we couldn't be happier", "let's not forget"] 

正則表達式的解釋:

(?:\ S)([A-ZA-Z] +)\ S([A-ZA-Z] {1 })\ S

(\ S) - ?匹配並捕獲0或1空間羣1.
([A-ZA-Z] +) - 火柴和caputres 1或多個字母作爲第2組
\ s - 匹配1個空格
([A-ZA-Z] {1}) - 相配並捕獲1函作爲組3
\ S - 匹配1個空間

\ 1,\ 2和\ 3 - 組1,組2和組3