2012-12-20 28 views

回答

3
wordcounts = [] 
with open(filepath) as f: 
    text = f.read() 
    sentences = text.split('.') 
    for sentence in sentences: 
     words = sentence.split(' ') 
     wordcounts.append(len(words)) 
average_wordcount = sum(wordcounts)/len(wordcounts) 
+0

非常感謝!非常感謝你的幫助。 –

0

這應該會幫助你。但這是基本的東西,你應該至少嘗試一下自己。

此代碼假定每個句子都在一個新行中。

如果不是這種情況,您可以更正代碼,也可以在您的問題中反映出這一點,但對此並不清楚。

def read_lines_from_file(file_name): 
    with open(file_name, 'r') as f: 
     for line in f: 
      yield line.strip() 

def average_words(sentences): 
    counts = [] 
    for sentence in sentences: 
     counts.append(sentence.split()) 
    return float(sum(counts)/len(counts)) 

print average_words(read_lines_from_file(file_name)) 
4

用簡單的方式:

sents = text.split('.') 
avg_len = sum(len(x.split()) for x in sents)/len(sents) 

嚴重的辦法:用nltk根據目標語言規則來標記文本。

+2

介意整數除法! – moooeeeep

+0

謝謝。我必須以天真的方式去做。 –

相關問題