2016-12-01 50 views
-1

我必須創建一個程序,它讀取代碼行,直到單個「。」。被輸入,我必須刪除標點符號,全部更改爲小寫字母,刪除停用詞和後綴。除了能夠刪除後綴外,我已經管理了所有這些,我試過.strip,因爲您可以看到它,但它只接受一個參數,並且實際上並未從列表元素中刪除後綴。任何建議/指針/幫助?由於刪除python中列表元素的後綴

stopWords = [ "a", "i", "it", "am", "at", "on", "in", "to", "too", "very", \ 
      "of", "from", "here", "even", "the", "but", "and", "is", "my", \ 
      "them", "then", "this", "that", "than", "though", "so", "are" ] 

noStemWords = [ "feed", "sages", "yearling", "mass", "make", "sly", "ring" ] 


# -------- Replace with your code - e.g. delete line, add your code here ------------ 

Text = raw_input("Indexer: Type in lines, that finish with a . at start of line only: ").lower() 
while Text != ".": 
    LineNo = 0 
    x=0 
    y=0 
    i= 0 

#creates new string, cycles through strint Text and removes puctutaiton 
    PuncRemover = "" 
    for c in Text: 
     if c in ".,:;!?&'": 
      c="" 
     PuncRemover += c 

    SplitWords = PuncRemover.split() 

#loops through SplitWords list, removes value at x if found in StopWords list 
    while x < len(SplitWords)-1: 
     if SplitWords[x] in stopWords: 
      del SplitWords[x] 
     else: 
      x=x+1 

    while y < len(SplitWords)-1: 
     if SplitWords[y] in noStemWords: 
      y=y+1 
     else: 
      SplitWords[y].strip("ed") 
      y=y+1 

    Text = raw_input().lower() 

print "lines with stopwords removed:" + str(SplitWords) 
print Text 
print LineNo 
print x 
print y 
print PuncRemover 
+0

您正在閱讀的只是曾經在這裏,看看'raw_input'約 – martianwars

+1

有兩件事情代碼風格第一。你應該看看[Python命名約定](https://www.python.org/dev/peps/pep-0008/#naming-conventions)。大寫的單詞通常保留給類或類型變量。此外,你的'while'循環應該是'for'循環,因爲你知道你要執行多少次迭代。至於你的問題,你需要實際分配正在改變的列表元素。對於剝離字符序列,請參閱[這個問題](http://stackoverflow.com/questions/3900054/python-strip-multiple-characters) – danielunderwood

+0

讀入行是爲了添加到字典,這是爲什麼現在它只能讀取一次。 – Rydooo

回答

0

下面的函數應該從任何特定的字符串中刪除後綴。

from itertools import groupby 


def removeSuffixs(sentence): 

    suffixList = ["ing", "ation"] #add more as nessecary 

    for item in suffixList: 
     if item in sentence: 

      sentence = sentence.replace(item, "") 
      repeatLetters = next((True for char, group in groupby(sentence) 
            if sum(1 for _ in group) >= 2), False) 

      if repeatLetters: 

       sentence = sentence[:-1] 

    return sentence 

例子:

print(removeSuffixs("climbing running")) # 'climb run' 
print(removeSuffixs("summation")) # 'sum' 

在代碼中,替換SplitWords[y].strip("ed") 用,

SplitWords[y] = removeSuffixs(SplitWords[y])