2015-02-12 58 views
0

編寫帶有上面字符串列表的函數list_of_words,並返回刪除了所有空格和標點符號的單個列表的列表(除了撇號/單引號)。在python中使用strip()

我的代碼刪除句號和空格,但不包含逗號或感嘆號。

def list_of_words(list_str): 
    m = [] 
    for i in list_str: 
     i.strip('.') 
     i.strip(',') 
     i.strip('!') 
     m = m+i.split() 
    return m 

print(list_of_words(["Four score and seven years ago, our fathers brought forth on", 
    "this continent a new nation, conceived in liberty and dedicated", 
    "to the proposition that all men are created equal. Now we are", 
    " engaged in a great  civil war, testing whether that nation, or any", 
    "nation so conceived and so dedicated, can long endure!"]) 
+0

我需要使用strip()或split()方法,而不是replace方法。 – 2015-02-12 07:10:48

+0

簡短版本:'返回[word.strip('。,!')爲list_str中的部分part.split()]中的單詞' – Matthias 2015-02-12 09:59:46

回答

2

一個清除一些標點符號和空格多個最簡單的方法將使用re.sub功能。

import re 

sentence_list = ["Four score and seven years ago, our fathers brought forth on", 
       "this continent a new nation, conceived in liberty and dedicated", 
       "to the proposition that all men are created equal. Now we are", 
       " engaged in a great  civil war, testing whether that nation, or any", 
       "nation so conceived and so dedicated, can long endure!"] 

sentences = [re.sub('([,.!]){1,}', '', sentence).strip() for sentence in sentence_list] 
words = ' '.join([re.sub('([" "]){2,}', ' ', sentence).strip() for sentence in sentences]) 

print words 
"Four score and seven years ago our fathers brought forth on this continent a new nation conceived in liberty and dedicated to the proposition that all men are created equal Now we are engaged in a great civil war testing whether that nation or any nation so conceived and so dedicated can long endure" 
1

strip返回字符串,您應該捕獲並應用剩餘的條。 因此您的代碼應改爲

for i in list_str: 
    i = i.strip('.') 
    i = i.strip(',') 
    i = i.strip('!') 
    .... 

上第二個音符,strip只在開始和字符串的結束消除了以上列出的字符。如果要刪除字符串之間的字符,應考慮使用replace

0

如前所述,您需要將i.strip()指定爲i。如前所述,替換方法更好。下面是使用替代方法的示例:

def list_of_words(list_str:list)->list: 
    m=[] 
    for i in list_str: 
     i = i.replace('.','') 
     i = i.replace(',','') 
     i = i.replace('!','') 
     m.extend(i.split()) 
    return m 

print(list_of_words([ "Four score and seven years ago, our fathers brought forth on", 
    "this continent a new nation, conceived in liberty and dedicated", 
    "to the proposition that all men are created equal. Now we are", 
    " engaged in a great  civil war, testing whether that nation, or any", 
    "nation so conceived and so dedicated, can long endure! ]) 

正如你可以看到,我也換成m=m+i.split()m.append(i.split()),使其更易於閱讀。

1

您可以使用正則表達式,如this question中所述。從本質上講,

import re 

i = re.sub('[.,!]', '', i) 
0

這將是最好不要依靠自己的標點的名單上,但使用Python的一個和其他人的指針,使用正則表達式來刪除字符:

punctuations = re.sub("[`']", "", string.punctuation) 
i = re.sub("[" + punctuations + "]", "", i) 

還有string.whitespace,雖然拆分確實爲你照顧它們。