2016-11-25 61 views
1

我想嘗試在代碼3兩件事:Python 2.7版中刪除特定標點符號和禁用詞

  • 刪除特定的標點符號
  • 轉換輸入小寫
  • 刪除禁用詞

我怎樣才能刪除標點而不使用「連接」。功能?我是新來的Python並沒有成功尚未使用的類似的方式去除禁用詞...

import string 
s = raw_input("Search: ") #user input 
stopWords = [ "a", "i", "it", "am", "at", "on", "in", "to", "too", "very", \ 
      "of", "from", "here", "even", "the", "but", "and", "is", "my", \ 
      "them", "then", "this", "that", "than", "though", "so", "are" ] 

PunctuationToRemove = [".", ",", ":", ";", "!" ,"?", "&"] 

while s != "": 
    s1 = "" 

#Deleting punctuations and applying lowercase 
    for c in s:        #for characters in user's input 
     if c not in PunctuationToRemove + " ": #characters that don't include punctuations and blanks 
      s1 = s + c      #store the above result to s1 
      s1 = string.lower(s)   #then change s1 to lowercase 
    print s1 
+3

爲什麼「而無需使用連接」?這是作業嗎,還是隻是抱怨加入?正則表達式怎麼樣?此外,你的'while'循環永遠不會改變's',並因此無限運行。 –

+0

@tobias_k它只是我想管理我最近試圖吸收的主要功能:https://learnpythonthehardway.org/book/ –

回答

0

擺脫所有站的話,你可以這樣做:

[word for word in myString.split(" ") if word not in stopWords] 
0

我會建議首先擺脫所有標點符號。這可以通過使用一個for循環來完成:

for forbiddenChar in PunctuationToRemove: 
    s = s.replace(forbiddenChar,"")  #Replace forbidden chars with empty string 

然後,您可以輸入字符串s分成的話,使用s.split(' ')。然後,你可以使用一個for循環中添加的所有單詞,小寫字母,一個新的字符串s1

words = s.split(' ') 
s1 = "" 
for word in words: 
    if word not in stopWords: 
     s1 = s1 + string.lower(word) + " " 

s1 = s1.rstrip(" ")   #Strip trailing space 
+0

您在構建's1'時忘了添加空格。在那裏,你也可以擺脫停用詞。 –

+0

嗯。似乎我沒有注意。編輯包括我想說的內容:P – Lolgast

+0

's1 = s1 +「」+ word.lower()',我想。或者's1 ='%s%s'%(s1,word.lower()' –

0

這個怎麼樣,

s = 'I am student! Hello world&.~*~' 
PunctuationToRemove = [".", ",", ":", ";", "!" ,"?", "&"] 
stopWords = set([ "a", "i", "it", "am", "at", "on", "in", "to", "too", "very", \ 
       "of", "from", "here", "even", "the", "but", "and", "is", "my", \ 
       "them", "then", "this", "that", "than", "though", "so", "are" ]) 

# Remove specific punctuations 
s_removed_punctuations = s.translate(None, ''.join(PunctuationToRemove)) 

# Converte input to lowercase 
s_lower = s_removed_punctuations.lower() 

# Remove stop words 
s_result = ' '.join(s for s in s_lower.split() if s not in stopWords).strip() 

print(s_result) 
#student hello world~*~ 
+1

你不需要'''.join()'中的列表理解,你可以使用生成器表達式(=>''')。如果s不是停止詞,則可以簡化表達式''',如果s在停止詞中,則s可以簡化爲's',如果s不在'停止詞'中, 。 –

+0

@brunodesthuilliers,非常感謝您指出這一點。 – SparkAndShine