計數的指定字

-1

我想指望與和啓動文件「美」和「公民」在「就職」文件的數目。計數的指定字

cfd = nltk.ConditionalFreqDist(
      (target, file[:4]) 
       for fileid in inaugural.fileids() 
       for w in inaugural.words(fileid) 
       for target in ['america', 'citizen'] 
       if w.lower().startswith(target)) 

year = ['1789', '1793'] 
word = ['america', 'citizen'] 
cfd.tabulate(conditions=year, samples=word)

它沒有正確計數單詞。有什麼問題？注：我想顯示'美國'和'公民'列和列爲行。我出來就把：

america citizen 
1789 0 0 
1793 0 0

來源

2015-02-07 user2064809

使用'count（）'函數。 Google - > Python計數函數 - >結果。 – GLHF 2015-02-07 17:52:34

請修正您的發佈代碼的縮進。 – wwii 2015-02-07 17:55:52

您的條件和樣品是相反的順序，ConditionalFreqDist構造函數需要condition, sample，但你給它sample, condition。嘗試：

cfd = nltk.ConditionalFreqDist(
      (fileid[:4], target) 
       for fileid in inaugural.fileids() 
       for w in inaugural.words(fileid) 
       for target in ['america', 'citizen'] 
       if w.lower().startswith(target)) 

A = ['1789', '1793'] 
B = ['america', 'citizen'] 
cfd.tabulate(conditions=A, samples=B)

輸出

 america citizen 
1789 2 5 
1793 1 1

在一般情況下，你想使用一個詞幹，從而獲得類似：

from nltk.stem import SnowballStemmer 

stemmer = SnowballStemmer('english') 
cfd = nltk.ConditionalFreqDist(
     (fileid[:4], stemmer.stem(word)) 
      for fileid in inaugural.fileids() 
      for word in inaugural.words(fileid)) 

A = ['2009', '2005'] 
B = [stemmer.stem(i) for i in ['freedom', 'war']] 
cfd.tabulate(conditions=A, samples=B)

導致輸出

 freedom war 
2009 3 2 
2005 27 0

來源

2015-02-07 18:28:47

有趣的是，OP的帖子與[文檔中的示例]匹配（http://www.ling.helsinki.fi/kit/2009s/clt231/NLTK/book/ch02-AccessingTextCorporaAndLexicalResources.html#ref-first-four-chars）它使用'''（target，fileid [：4]）''' – wwii 2015-02-07 18:41:10

你的代碼效果更好。但我不知道爲什麼它不總是給我正確的結果。我用眼睛來計算一個特定的詞。該方案給出不同的結果！（即有時程序給出了錯誤的結果） – user2064809 2015-02-07 18:49:47

這裏是算法，你可以使用count功能;

print (mystring.count("specificword"))

演示;

mystring = "hey hey hi hello hey hello hi" 
print (mystring.count("hey")) 

>>> 
3 
>>>

其餘的，取決於你。顯示他們像一張桌子基本上是用print函數來操縱它們。另一個演示;

mystring = "hey hey hi hello hey hello hi" 

a = mystring.count("hey") 
b = mystring.count("hi") 
c = mystring.count("hello") 

obj = """hey: {} 
hi: {} 
hello {}""" 

print (obj.format(a,b,c))

輸出;

>>> 
hey: 3 
hi: 2 
hello 2 
>>>

來源

2015-02-07 18:00:53 GLHF

您可以使用nltk.sent_tokenize創建單詞的列表，然後使用collections.Counter到grub的字典的話是其關鍵和詞語的頻率值：

從收藏導入櫃檯

with open(file) as f: 
     C=Counter(nltk.sent_tokenize(f.lower())) 
     B = ['america', 'citizen'] 
     for i in B: 
      print C[i]

來源

2015-02-07 18:01:40 Kasramvd

計數的指定字

回答

相關問題