刪除標點符號和大寫字母時遇到問題？（初學者）

可怕的程序員在這裏。對於課堂作業，我必須從文本文檔中提取文字，對它們進行計數並對它們進行排序。我無法擦除標點符號，並用較低的字母替換大寫字母。任何指導將不勝感激。刪除標點符號和大寫字母時遇到問題？（初學者）

docwords={} 
doc=raw_input("Please enter the name of a text file: ") 
docread=open(doc, 'r') 
doclist=[] 



def main(): 
    for x in docread: 
     words = x.split() 
    for word in words: 
     doclist.append(word) 

def wordcount(): 
    main() 
    for counter in doclist: 
     docwords[counter] = docwords.get(counter,0) + 1 

wordcount() 
docread.close() 
for p in sorted(docwords): 
    print p, "-->", docwords[p]

來源

2011-11-14 user1044868

在標準庫中有一個「計數器」類，可以用來進行單詞的實際計數。 –

這可能真的都在1號線（提示，打開，讀取，分裂，帶材和低withing列表COMP）來完成：

words = [word.strip("!\"#$%&\'()*+,-./:;<=>[email protected][\\]^_`{|}~").lower() for word in open(raw_input("Please enter the name of a text file: ").strip(), 'r').read().replace("'", "").split()]

然後打印統計數據：

print "Word count: %d" % len(words) 
for p in sorted(words): 
    print %s --> %s" % (p, words[p])

或者，長（er）：

docwords={} 
doc=raw_input("Please enter the name of a text file: ") 
docread=open(doc, 'r') 
doclist=[] 

def main(): 

    for x in docread: 
     doclist.extend([word.strip("!\"#$%&\'()*+,-./:;<=>[email protected][\\]^_`{|}~").replace("'", "").lower() for word in x.split()]) 

def wordcount(): 
    main() 
    for counter in doclist: 
        docwords[counter] = docwords.get(counter,0) + 1 

wordcount() 
docread.close() 
for p in sorted(docwords): 
    print p, "-->", docwords[p]

來源

2011-11-14 03:18:24 chown

首先，你的main沒有做你想做的。請注意0循環的作用：首先，逐行讀取每行，並將每行中的單詞列表分配到words。但是您剛剛一遍又一遍地覆蓋words，所以現在words是最後一行中的單詞列表。現在，你把這些單詞放入doclist。首先考慮如何進行循環嵌套並修復此部分：

def main(): 
    for x in docread: 
     words = x.split() 
    for word in words: 
     doclist.append(word)

現在，我們可以轉到缺少的部分。 Python有很多有用的庫。爲了降低字符串，請嘗試在這裏查看：http://docs.python.org/library/stdtypes.html#str.lower。爲了擺脫標點符號，你可能會發現這個功能有助於確定一個字符是否是一個字母：http://docs.python.org/library/stdtypes.html#str.isalpha。

由於它的功課，我很猶豫是否放棄代碼。否則，你不會學習它。如果再次卡住，請說出點事情。

來源

2011-11-14 03:24:25

非常感謝。我得到了小寫字母和標點符號。 for循環是一個錯字，我很抱歉。 – user1044868

@ user1044868無需道歉。如果您想修改記錄，您可以編輯錯誤信息。既然你是新來的，我還要指出，你應該接受你的問題的答案，特別是如果你想要回答未來的問題。 –

刪除標點

一種選擇是正則表達式模塊的應用re.sub功能。在這種情況下，我將刪除所有不是字母數字或空格的字符。

import re 
s = "It's ok" 
print re.sub('[^\w ]','',s) 
Its ok

小寫

字符串對象的直接的低級功能。

>>> 'Its ok'.lower() 
its ok

來源

2011-11-14 03:31:27

刪除標點符號和大寫字母時遇到問題？ （初學者）

回答

相關問題

刪除標點符號和大寫字母時遇到問題？（初學者）