2014-03-26 53 views
1

開平的概率,我有一個單元陣列,第2欄是的「XX-> XX」的時間,例如:計數在單元陣列

'AA->AA' [21] [4.2084] 
'AA->AC' [15] [3.0060] 
'AA->AG' [ 9] [1.8036] 
'AA->AT' [12] [2.4048] 
'AC->CA' [14] [2.8056] 
'AC->CC' [16] [3.2064] 
'AC->CG' [ 5] [1.0020] 
'AC->CT' [ 3] [0.6012] 
'AG->GA' [11] [2.2044] 
'AG->GC' [ 5] [1.0020] 
'AG->GG' [ 8] [1.6032] 
'AG->GT' [13] [2.6052] 
'AT->TA' [10] [2.0040] 
'AT->TC' [ 8] [1.6032] 
'AT->TG' [ 2] [0.4008] 
'AT->TT' [11] [2.2044] 
'CA->AA' [17] [3.4068] 
'CA->AC' [ 7] [1.4028] 
'CA->AG' [ 9] [1.8036] 
'CA->AT' [11] [2.2044] 
'CC->CA' [15] [3.0060] 
'CC->CC' [ 5] [1.0020] 
'CC->CG' [ 4] [0.8016] 
'CC->CT' [17] [3.4068] 
'CG->GA' [ 1] [0.2004] 
'CG->GC' [ 2] [0.4008] 
'CG->GG' [ 9] [1.8036] 
'CG->GT' [ 3] [0.6012] 
'CT->TA' [ 7] [1.4028] 
'CT->TC' [ 9] [1.8036] 
'CT->TG' [ 9] [1.8036] 
'CT->TT' [ 2] [0.4008] 
'GA->AA' [10] [2.0040] 
'GA->AC' [ 4] [0.8016] 
'GA->AG' [10] [2.0040] 
'GA->AT' [ 2] [0.4008] 
'GC->CA' [ 2] [0.4008] 
'GC->CC' [ 7] [1.4028] 
'GC->CG' [ 6] [1.2024] 
'GC->CT' [ 3] [0.6012] 
'GG->GA' [ 6] [1.2024] 
'GG->GC' [ 6] [1.2024] 
'GG->GG' [ 4] [0.8016] 
'GG->GT' [ 8] [1.6032] 
'GT->TA' [ 6] [1.2024] 
'GT->TC' [11] [2.2044] 
'GT->TG' [ 8] [1.6032] 
'GT->TT' [ 5] [1.0020] 
'TA->AA' [ 8] [1.6032] 
'TA->AC' [13] [2.6052] 
'TA->AG' [ 9] [1.8036] 
'TA->AT' [ 6] [1.2024] 
'TC->CA' [13] [2.6052] 
'TC->CC' [13] [2.6052] 
'TC->CT' [ 4] [0.8016] 
'TG->GA' [ 8] [1.6032] 
'TG->GC' [ 5] [1.0020] 
'TG->GG' [ 3] [0.6012] 
'TG->GT' [ 6] [1.2024] 
'TT->TA' [13] [2.6052] 
'TT->TC' [ 2] [0.4008] 
'TT->TG' [ 3] [0.6012] 
'TT->TT' [ 5] [1.0020] 

現在,我試圖計算概率:P('AA-> AA')= TIMES('AA-> AA')/ SUM('AA-> AA','AA-> AC','AA-> AG','AA-> AT '),換句話說,P('AA-> AA')= TIMES('AA-> AA')/ SUM('AA->任何人)。與其他人一樣。我想用一個循環做到這一點,但在

'TC->CA' [13] [2.6052] 
'TC->CC' [13] [2.6052] 
'TC->CT' [ 4] [0.8016] 

以及一個極值情況下,顯然是「TC-> CG」的次數爲0,這甚至需要,也可考慮我們已經知道概率應該是0.當然,這個極值情況可能發生在其他任何一個類似的情況下,有時可能缺少'TT-> TT'或有時缺少'TC-> CT'。 任何人都知道如何做到這一點? 謝謝。

+0

所以輸出將是一個概率單元陣列,但對於'TC-> CG'這樣的情況,在輸入單元陣列中沒有提到的情況下,行數更多? – Divakar

+0

是的,應該是。 – Jack2007

+0

缺少多少個案例?大約我的意思是,它像10個或更多像100? – patrik

回答

1

試試這個 -

%%// Get the cell data into data1 
data1 = INPUT_DATA; 

%%// Get the data from columns separately 
col1 = data1(:,1); 
tag_data = vertcat(col1{:}); 

col2 = data1(:,2); 
times_data = vertcat(col2{:}); 

col3 = data1(:,3); 
col3_data = vertcat(col3{:}); 

%%// Get full data for tag, times and column3 
char_array = ['A' 'C' 'G' 'T']; 
full_tag_data = char_array(combinator(4,3,'p','r')); 
full_tag_data = [full_tag_data(:,1:2) repmat('->',[size(full_tag_data,1) 1]) full_tag_data(:,2:3)]; 

present_rows = ismember(full_tag_data,tag_data,'rows'); 
full_times_data = double(present_rows); 
full_times_data(present_rows) = times_data; 

full_col3_data = double(present_rows); 
full_col3_data(present_rows) = col3_data; 

%%// Get the sum values 
full_col3_data_summed = sum(reshape(full_col3_data,4,[]),1); 
full_col3_data_summed = reshape(repmat(full_col3_data_summed,[4 1]),[],1); 

%%// Store the required values into a cell array out_cell1 
out_cell1 = cell(size(present_rows,1),2); 
out_cell1(:,1) = cellstr(full_tag_data); 
out_cell1(:,2) = num2cell(full_times_data); 
out_cell1(:,3) = num2cell(full_col3_data); 

%%// The probabilities are added into the cell array as the fourth column 
out_cell1(:,4) = num2cell(full_times_data./full_col3_data_summed); 

注:上面的代碼使用函數combinator,這是可以here