matlab中細胞陣列的比較

我有兩個單元格數組，每個單元格存儲單元格和bigrams，我已經從文本文件中提取。現在，我必須將每個單元與兩個bigram進行比較，以找出在bigram中存在的單元的數量和後來的可能性。任何人都可以幫助我如何排序問題，我已經使用strcmp，但它不工作。我寫我下面的代碼：matlab中細胞陣列的比較

for i = 1 
    for j = 1:bigramRow 
     bigram1 = regexp(splitBigramCellsA{j},'<s>|\w*|</s>','match'); 
     b1 = cellfun(@(x,y)[x], bigram1(1:end-1)','un',0) 
     match = strcmp(splitUnigramCellsA, splitBigramCellsA{j,1}); 

     if match ==1 
      bigram1count = splitbigramCellsB{j}; 
      unigram1count = splitUnigramCellsB{j}; 
      disp(bigram1count) 
      disp(unigram1count) 
     end 
end 
end

來源

2016-01-20 Seema

你能解釋一下unigrams和bigrams是什麼？ splitBigramCells包含什麼？ – Jonas

Unigrams是句子中的每個獨特單詞。 Bigrams是一次採取的兩個字。例如：'這是一個美好的一天'，包含了''它是''，'是'，'可愛的'，'美好的一天'。 – Seema

如果你能適應在內存中的文字，你可以做到以下幾點：

創建的所有單詞的單元陣列（按順序）
通話在單元陣列上是唯一的，並捕獲第三個輸出。這是以索引數組表示的原始文本，其中每個索引都指向一個單元。
創建所有bigrams爲bigrams = [indices(1:2:largestEven),indices(2:2:largestEven);indices(2:2:largestOdd),indices(3:2:largestOdd)]，其中largestEven爲2*floor(length(indices)/2)和largestOdd爲2*floor((length(indices)+1)/2)+1。
計算例如在bigrams每個單元的頻率爲tabulate(bigrams(:))

來源

2016-01-20 13:11:21 Jonas

matlab中細胞陣列的比較

回答

相關問題