在單元陣列中計數單詞matlab

我有一個500x1的單元格，每一行都有一個特定的單詞。我如何計算有多少詞的出現並顯示它，並顯示每個出現的百分比。在單元陣列中計數單詞matlab

例如

的這些詞語的發生是：

Ans = 

    200 Green 
    200 Red 
    100 Blue

這些詞語的百分比：

Ans = 

    40% Green 
    40% Red 
    20% Blue

來源

2012-07-19 Kirsty White

你已經有一個列表，它的唯一的話是在原來的500x1單元陣列？ – 2012-07-19 07:17:48

實際上，我剛剛發現了一個覆蓋你的問題的很棒的解決方案，[@Peter的回答]（http://stackoverflow.com/a/13593029/1705967） – 2012-11-27 22:15:16

的想法是，strcmpi比較單元矩陣的elementwise。這可用於將輸入名稱與輸入中的唯一名稱進行比較。嘗試下面的代碼。

% generate some input 
input={'green','red','green','red','blue'}'; 

% find the unique elements in the input 
uniqueNames=unique(input)'; 

% use string comparison ignoring the case 
occurrences=strcmpi(input(:,ones(1,length(uniqueNames))),uniqueNames(ones(length(input),1),:)); 

% count the occurences 
counts=sum(occurrences,1); 

%pretty printing 
for i=1:length(counts) 
    disp([uniqueNames{i} ': ' num2str(counts(i))]) 
end

我將百分比計算留給你。

來源

2012-07-19 07:25:08 denahiro

我會將發生的行更改爲一個單獨的情況較低或較高。我會先做這個 input = lower（input）;這會將所有字符串返回爲小寫。更容易，因爲如果大小不匹配可能會發生..只是隨機意見 – user2867655 2014-02-17 10:52:22

首先發現數據的唯一詞：

% set up sample data: 
data = [{'red'}; {'green'}; {'blue'}; {'blue'}; {'blue'}; {'red'}; {'red'}; {'green'}; {'red'}; {'blue'}; {'red'}; {'green'}; {'green'}; ] 
uniqwords = unique(data);

然後找到這種獨特的詞出現次數的數據：

[~,uniq_id]=ismember(data,uniqwords);

然後簡單地計算每一個獨特的字有多少次發現：

uniq_word_num = arrayfun(@(x) sum(uniq_id==x),1:numel(uniqwords));

要得到百分比，除以數據樣本總數的總和：

uniq_word_perc = uniq_word_num/numel(data)

來源

2012-07-19 07:30:16

Gunther你會如何計算denahiro的答案的百分比？ – 2012-07-19 07:55:27

與此處相同的方式，將結果數除以樣本總數 – 2012-07-19 08:02:31

這是我的解決方案，應該是相當快的。

% example input 
example = 'This is an example corpus. Is is a verb?'; 
words = regexp(example, ' ', 'split'); 

%your program, result in vocabulary and counts. (input is a cell array called words) 
vocabulary = unique(words); 
n = length(vocabulary); 
counts = zeros(n, 1); 
for i=1:n 
    counts(i) = sum(strcmpi(words, vocabulary{i})); 
end 

%process results 
[val, idx]=max(counts); 
most_frequent_word = vocabulary{idx}; 

%percentages: 
percentages=counts/sum(counts);

來源

2012-11-27 20:52:26

，而無需使用顯式的維權取巧的辦法..

clc 
close all 
clear all 

Paragraph=lower(fileread('Temp1.txt')); 

AlphabetFlag=Paragraph>=97 & Paragraph<=122; % finding alphabets 

DelimFlag=find(AlphabetFlag==0); % considering non-alphabets delimiters 
WordLength=[DelimFlag(1), diff(DelimFlag)]; 
Paragraph(DelimFlag)=[]; % setting delimiters to white space 
Words=mat2cell(Paragraph, 1, WordLength-1); % cut the paragraph into words 

[SortWords, Ia, Ic]=unique(Words); %finding unique words and their subscript 

Bincounts = histc(Ic,1:size(Ia, 1));%finding their occurence 
[SortBincounts, IndBincounts]=sort(Bincounts, 'descend');% finding their frequency 

FreqWords=SortWords(IndBincounts); % sorting words according to their frequency 
FreqWords(1)=[];SortBincounts(1)=[]; % dealing with remaining white space 

Freq=SortBincounts/sum(SortBincounts)*100; % frequency percentage 

%% plot 
NMostCommon=20; 
disp(Freq(1:NMostCommon)) 
pie([Freq(1:NMostCommon); 100-sum(Freq(1:NMostCommon))], [FreqWords(1:NMostCommon), {'other words'}]);

來源

2014-03-03 09:08:11 zhao

在單元陣列中計數單詞matlab

回答

相關問題