2014-08-28 92 views
0

我想繪製文本文件的分發,但是我發現,我應該包括數字0-9和_ - 也給下面的代碼MATLAB的性質分佈

f = fopen('c:\nouns.txt'); 
ns = textscan(f, '%s'); 
fclose(f); 
%// Convert everything to chars 
letters_char = reshape(char(ns{:}),[],1); 

%// Get the case-insensitive count of each letter 
count_lettters = sum(bsxfun(@eq,letters_char,97:122),1) + ... 
    sum(bsxfun(@eq,letters_char,65:90),1) 

plot(count_lettters./sum(count_lettters)) 
bar(count_lettters./sum(count_lettters)) 
set(gca, 'XTickLabel',cellstr(char(97:122)'),'XTick',1:26) 

這將計算和繪製從az的字母分佈 我想包括az和0-9和 - 和_ 任何建議?

+0

請提供最低工作例子,是對所遇到的問題更精確。 – fuesika 2014-08-28 20:06:04

+0

這就夠了嗎?或者您需要更多詳細信息?> – user2085339 2014-08-28 20:13:03

+0

嘗試運行只是您提供的部分..我想至少一個'@ eq'的定義仍然丟失。 – fuesika 2014-08-28 20:14:28

回答

2

代碼

f = fopen(path_to_text_file); 
ns = textscan(f, '%s'); 
fclose(f); 

%// Convert everything to chars 
letters_char = reshape(char(ns{:}),[],1); 

%// Get the case-insensitive count of each letter 
count_lettters = sum(bsxfun(@eq,letters_char,97:122),1) + ... 
    sum(bsxfun(@eq,letters_char,65:90),1); 

count_numbers = sum(bsxfun(@eq,letters_char,48:57),1) 

underscore_c = sum(letters_char=='_') 
hyphen_c = sum(letters_char=='-') 

counts = [underscore_c hyphen_c count_numbers count_lettters] 

xtickstr = ['_'; '-'; cellstr(num2str([0:9]')) ; cellstr(char(97:122)')] 
bar(counts./sum(counts)) 
set(gca, 'XTickLabel',xtickstr,'XTick',1:numel(xtickstr)) 

xlabel('ASCII Characters') 
ylabel('Probability Distribution') 

輸出的情節對於一個典型的文本文件

enter image description here