以字母結尾的可能性？

我有一個大約9000個小寫字的文本文件。我想找出每個單詞中最後一個字母的概率（字母的頻率/字數）。以字母結尾的可能性？

這是我第一次去：

function [ prma ] = problast() 
counts = zeros(1,26); 
%refer to cell index here to get alphabetic number of char 
s = regexp('abcdefghijklmnopqrstuvwxyz','.','match'); 
f = fopen('nouns.txt'); 
ns = textscan(f,'%s'); 
fclose(f); 
%8960 is the length of the file 
for i =1:8960 
c = substr(ns(i),-1,1); 
num = find(s == c); 
counts(num) = num; 
end 
prma = counts/8960; 
disp(prma);

這給了我這個錯誤：

Undefined function 'substr' for input arguments of type 'cell'.

任何想法？

來源

2013-04-02 user1892115

首先，您不需要regexp爲您的問題。一個非常簡單而有效的解決問題的方法是：

clear; 
close; 
clc; 

counts = zeros(1,26); 

f = fopen('nouns.txt'); 
ns = textscan(f,'%s'); 
fclose(f); 

for i =1:numel(ns{1}) 
    c = ns{1}{i}(end); 
    counts('c'-96) = counts('c'-96)+1; 
end 

prma = counts/numel(ns{1}); 
disp(prma);

例如，如果"noun.txt"是包含

paris 
london

輸出將是：

Columns 1 through 8 

     0   0   0   0   0   0   0   0 

    Columns 9 through 16 

     0   0   0   0   0 0.5000   0   0 

    Columns 17 through 24 

     0   0 0.5000   0   0   0   0   0 

    Columns 25 through 26 

     0   0

來源

2013-04-02 08:29:20

可以使用for循環討論效率問題。您可以改用直方圖（請參閱Shai的解決方案）。 –

textscan文檔指出結果是cell array。如果你不熟悉的電池陣列我強烈建議你閱讀我給的鏈接，但長期和短期的它是你的代碼應該是這樣的：

c = substr(ns{i},-1,1);

注意的變化從()到{} - 這是如何訪問單元數組元素。

來源

2013-04-02 07:45:41 jazzbassrob

我將括號改爲大括號，但是我得到的錯誤與上面相同。我在做別的事嗎？ – user1892115

不知道是什麼原因造成的問題，但是這應該做的伎倆，假設ns{i}包含您的字符串：

str = ns{i}; 
c = str(end);

如果這不工作應該不會太難了一下週圍玩，創建變量str基於ns

來源

2013-04-02 08:27:35

如何：

f = fopen('nouns.txt'); 
ns = textscan(f, '%s'); 
fclose(f); 

num = cellfun(@(x)(x(end) - 'a' + 1), ns{:}); %// Convert to 1-26 
counts = hist(num, 1:26);      %// Count occurrences 
prob = counts/numel(ns{:})     %// Compute probabilities

來源

2013-04-02 08:30:15 Shai

'textscan'已經標記了單詞，爲什麼使用'regexp'呢？另外，我認爲你需要在模式中使用'[^ az] *'而不是'[^ az]'...... –

哦，我相信它應該用'x（end）'代替'x（1）'，因爲這個問題要求在單詞中的可能性，而不是第一個。我冒昧地修改你的解決方案... –

@EitanTit是'x（1）'，當我用'regexp'去掉只有激活信的時候 – Shai

感謝大家的建議，我自己解決了這個問題，但我回去嘗試了最後一個答案，它完美地工作。這是我想出的：

%Keep track of counts 
counts = zeros(1,26); 
%Refer to this array to get alphabetic numeric value of character 
s = regexp('abcdefghijklmnopqrstuvwxyz','.','match'); 
f = fopen('nouns.txt'); 
ns = textscan(f,'%s'); 
fclose(f); 
%8960 = length of nouns.txt 
for i =1:8960 
    %string from vs 
    str = ns{1}{i}; 
    %last character in that string 
    c = str(length(str)); 
    %index in s 
    temp = strfind(s,c); 
    index = find(not(cellfun('isempty',temp))); 
    counts(index) = counts(index)+1; 
end 

%Get probabilities 
prma = counts/8960; 
disp(prma);

我投票支持大家幫我集體討論。

來源

2013-04-02 08:41:21 user1892115

以字母結尾的可能性？

回答

相關問題