從第一項的另一個列表中的列表中計算單詞

Hy，我想從位置零上的另一個列表中的列表中計算給定的短語。從第一項的另一個列表中的列表中計算單詞

list_given_atoms= ['C', 'Cl', 'Br'] 
list_of_molecules= ['C(B2Br)[Cl{H]Cl}P' ,'NAME']

當蟒蛇找到匹配它應該在字典中薩法德像

countdict = [ 'Cl : 2', 'C : 1', 'Br : 1']

我試圖

re.findall(r'\w+', list_of_molecules[0])

已經但是，在像「B2Br」，也就是說resulsts這是definitly不是我想要的。

有人可以幫我嗎？

來源

2017-12-02 Schoko

*在字典中的* - 但你'countdict'是一個列表 – RomanPerekhrest

[a-zA-Z]+應該用來代替\w+因爲\w+將匹配字母和數字，而你只是在尋找字母：

import re 
list_given_atoms= ['C', 'Cl', 'Br'] 
list_of_molecules= ['C(B2Br)[Cl{H]Cl}P' ,'NAME'] 
molecules = re.findall('[a-zA-Z]+', list_of_molecules[0]) 
final_data = {i:molecules.count(i) for i in list_given_atoms}

輸出：

{'C': 1, 'Br': 1, 'Cl': 2}

來源

2017-12-02 23:26:39 Ajax1234

我喜歡你的答案，但我會改變正則表達式爲'r'[AZ] [az]？''因爲你總是會得到一個大寫字母和一個可選的小寫字母。 –

@BrettBeatty關於正則表達式，'[a-zA-Z]'將以任何順序覆蓋任何事件。因此，這實際上並不重要。 – Ajax1234

是的，我猜原始字符串總是有原子名稱之間的東西。我以爲你最終可能會發生類似「HCl」的事件。 –

你可以使用這樣的事情：

>>> Counter(re.findall('|'.join(sorted(list_given_atoms, key=len, reverse=True)), list_of_molecules[0])) 
Counter({'Cl': 2, 'C': 1, 'Br': 1})

您必須按照它們的長度對元素進行排序，因此'Cl'匹配'C'之前的元素。

來源

2017-12-02 23:27:57

短re.findall()解決方案：

import re 

list_given_atoms = ['C', 'Cl', 'Br'] 
list_of_molecules = ['C(B2Br)[Cl{H]Cl}P' ,'NAME'] 
d = { a: len(re.findall(r'' + a + '(?=[^a-z]|$)', list_of_molecules[0], re.I)) 
     for a in list_given_atoms } 

print(d)

輸出：

{'C': 1, 'Cl': 2, 'Br': 1}

來源

2017-12-02 23:34:52 RomanPerekhrest

我想你的解決方案，我想通了，也有後對方几個C。所以，我來到這一個位置：

for element in re.findall(r'([A-Z])([a-z|A-Z])?'. list_of_molecules[0]): 
    if element[1].islower: 
     counter = element[0] + element[1] 
     if not (counter in counter_dict): 
      counter_dict[counter] = 1 
     else: 
      counter_dict[counter] += 1

我檢查元素只有一個案件，並將它們添加到字典的方式相同。可能有更好的方法。

來源

2017-12-03 15:24:28 Schoko

您不能使用/w作爲一個單詞字符等同於：

[a-zA-Z0-9_]

其中明確包括號碼，因此"B2Br"匹配。

你也不能只使用正則表達式：

[a-zA-Z]+

因爲這會產生這樣的事情"CO2"應該產生2分離分子的一個原子：C和0。

不過，我想出了（regex101）正則表達式只檢查一個大寫字母，然後0和1（因此可選）小寫字母之間。

這：

[A-Z][a-z]{0,1}

和它將正確產生的原子。

於是將這一到您的原lists的：

list_given_atoms= ['C', 'Cl', 'Br'] 
list_of_molecules= ['C(B2Br)[Cl{H]Cl}P' ,'NAME']

我們想先找到在list_of_molecules所有的原子，然後創建原子計數的字典中list_given_atoms。

因此，要找到所有的原子，我們可以在分子列表中的第一個元素上使用re.findall：

atoms = re.findall("[A-Z][a-z]{0,1}", list_of_molecules[0])

這給list：

['C', 'B', 'Br', 'Cl', 'H', 'Cl', 'P']

然後，拿到計數字典，我們可以用dictionary-comprehension：

counts = {a: atoms.count(a) for a in list_given_atoms}

，其表明了期望的結果：

{'Cl': 2, 'C': 1, 'Br': 1}

而且也將工作時，我們有分子如CO2等

來源

2017-12-03 16:10:29

從第一項的另一個列表中的列表中計算單詞

回答

相關問題