2016-08-02 41 views
0

我需要計算列表中存在其相應的unigrams的二元組的概率。例如,期望的結果是在下面的列表中,'pretty girl', 'pretty', 'girl'都存在。因此,該概率是,通過在列表P使用的值,(0.0017) % (0.003 * 0.002) = 5.999999999999987e-06在列表中選擇特殊元素並計算條件概率

S = ['girl', 'pretty', 'pretty girl', 'our', 'our world', 'wide', 'word', 'yes', 'yike', 'yummy'] 

P = [0.003, 0.002, 0.0017, 0.003, 0.006, 0.004, 0.002, 0.012, 0.006, 0.003] 

我有以下代碼。它似乎沒有給我結果,因此我不能繼續計算概率。我試圖用這個代碼做的是在列表中選擇bigrams並找到它們相應的unigrams。然後我打算在P中匹配他們的概率。

In [60]: import re 
In [61]: M = [] 
In [62]: for i in range(len(S)): 
      s_split = S[i].split() 
      s_split_len = len(S[i].split()) 
      if s_split_len == 2: 
       m = [] 
       a = re.match(s_split[0], S[i]) 
       b = re.match(s_split[1], S[i]) 
       m.append(a) 
       m.append(b) 
       M.append(m) 
       print M 

[[<_sre.SRE_Match object at 0x10447b988>, None], [<_sre.SRE_Match object at 0x10447b8b8>, None], [<_sre.SRE_Match object at 0x10447b920>, None], [<_sre.SRE_Match object at 0x10447b9f0>, None], [<_sre.SRE_Match object at 0x10447bac0>, None], [<_sre.SRE_Match object at 0x10447bb90>, None], [<_sre.SRE_Match object at 0x10447bbf8>, None], [<_sre.SRE_Match object at 0x10447bc60>, None], [<_sre.SRE_Match object at 0x10447bcc8>, None], [<_sre.SRE_Match object at 0x10447bd30>, None], [<_sre.SRE_Match object at 0x10447bd98>, None], [<_sre.SRE_Match object at 0x10447be00>, None], [<_sre.SRE_Match object at 0x10447be68>, None], [<_sre.SRE_Match object at 0x10447bed0>, None], [<_sre.SRE_Match object at 0x10447bf38>, None], [<_sre.SRE_Match object at 0x1044a8030>, None], [<_sre.SRE_Match object at 0x1044a8098>, None], [<_sre.SRE_Match object at 0x1044a8100>, None], [<_sre.SRE_Match object at 0x1044a8168>, None], [<_sre.SRE_Match object at 0x1044a81d0>, None], [<_sre.SRE_Match object at 0x1044a8238>, None], [<_sre.SRE_Match object at 0x1044a82a0>, None], [<_sre.SRE_Match object at 0x1044a8308>, None], [<_sre.SRE_Match object at 0x1044a8370>, None], [<_sre.SRE_Match object at 0x1044a83d8>, None], [<_sre.SRE_Match object at 0x1044a8440>, None], [<_sre.SRE_Match object at 0x1044a84a8>, None], [<_sre.SRE_Match object at 0x1044a8510>, None], [<_sre.SRE_Match object at 0x1044a8578>, None], [<_sre.SRE_Match object at 0x1044a85e0>, None], [<_sre.SRE_Match object at 0x1044a8648>, None], [<_sre.SRE_Match object at 0x1044a86b0>, None], [<_sre.SRE_Match object at 0x1044a8718>, None]] 
[[<_sre.SRE_Match object at 0x10447b988>, None], [<_sre.SRE_Match object at 0x10447b8b8>, None], [<_sre.SRE_Match object at 0x10447b920>, None], [<_sre.SRE_Match object at 0x10447b9f0>, None], [<_sre.SRE_Match object at 0x10447bac0>, None], [<_sre.SRE_Match object at 0x10447bb90>, None], [<_sre.SRE_Match object at 0x10447bbf8>, None], [<_sre.SRE_Match object at 0x10447bc60>, None], [<_sre.SRE_Match object at 0x10447bcc8>, None], [<_sre.SRE_Match object at 0x10447bd30>, None], [<_sre.SRE_Match object at 0x10447bd98>, None], [<_sre.SRE_Match object at 0x10447be00>, None], [<_sre.SRE_Match object at 0x10447be68>, None], [<_sre.SRE_Match object at 0x10447bed0>, None], [<_sre.SRE_Match object at 0x10447bf38>, None], [<_sre.SRE_Match object at 0x1044a8030>, None], [<_sre.SRE_Match object at 0x1044a8098>, None], [<_sre.SRE_Match object at 0x1044a8100>, None], [<_sre.SRE_Match object at 0x1044a8168>, None], [<_sre.SRE_Match object at 0x1044a81d0>, None], [<_sre.SRE_Match object at 0x1044a8238>, None], [<_sre.SRE_Match object at 0x1044a82a0>, None], [<_sre.SRE_Match object at 0x1044a8308>, None], [<_sre.SRE_Match object at 0x1044a8370>, None], [<_sre.SRE_Match object at 0x1044a83d8>, None], [<_sre.SRE_Match object at 0x1044a8440>, None], [<_sre.SRE_Match object at 0x1044a84a8>, None], [<_sre.SRE_Match object at 0x1044a8510>, None], [<_sre.SRE_Match object at 0x1044a8578>, None], [<_sre.SRE_Match object at 0x1044a85e0>, None], [<_sre.SRE_Match object at 0x1044a8648>, None], [<_sre.SRE_Match object at 0x1044a86b0>, None], [<_sre.SRE_Match object at 0x1044a8718>, None], [<_sre.SRE_Match object at 0x1044a8780>, None]] 
+0

感謝@Chris_Rands。這個例子在我的描述中給出(帖子的前四行)。示例數據是列表S和P.代碼的輸出是帖子最後部分中的對象列表。 – achimneyswallow

回答

0

這工作

S = ['girl', 'pretty', 'pretty girl', 'our', 'our world', 'wide', 'world', 'yes', 'yike', 'yummy'] 

P = [0.003, 0.002, 0.0017, 0.003, 0.006, 0.004, 0.002, 0.012, 0.006, 0.003] 


for i in range(len(S)): 
    s_split = S[i].split() 
    s_split_len = len(S[i].split()) 
    if s_split_len == 2: 
     a = S.index(S[i]) 
     b = S.index(S[i].split()[0]) 
     c = S.index(S[i].split()[1]) 
     if a != None: 
      if b != None: 
       co = [a, b, c] 
       probs = P[co[0]], P[co[1]], P[co[2]] 
       print S[i], probs[0] % (probs[1] * probs[2])