字符串分割到字符和數字並存儲在地圖的Python

「A15B7C2」

它代表的字符數的字符串。

我正在使用re現在將其拆分爲字符和數字。之後，最終將其存儲在一個字典

import re 
data_str = 'A15B7C2' 
re.split("(\d+)", data_str) 
# prints --> ['A', '15', 'B', '7', 'C', '2', '']

但是，如果我有一個像

字符串 'A15B7CD2Ef5'

它意味着的C count是1（其隱含的）和EF的計數是5（大寫和小寫的後續計爲一個鍵）的結果，我得到

「CD」 = 2（尚未correc T）
'EF'= 5（正確）

如何修改它爲我提供正確的計數？什麼是解析和獲得計數和存儲在字典中的最佳方法？

來源

2017-06-21 holmes840

所以，你不能再用一個或多個數字分割。你試圖解決什麼問題？你的字符串是字母數字嗎？ –

@Wiktor：Did。.. [r] [^ \ W \ d _] + | \ d +「''r」（\ d + | \ s +）「''」（[A-Za-z]）（[0 -9] *）「'和其他一些，但只是無法得到正確的正則表達式。看着答案，結果是更多的焦點會有所幫助;）即使如此，我也不可能把整個事情寫下來！行:) – holmes840

你可以做到這一切一舉：

In [2]: s = 'A15B7CD2Ef5' 

In [3]: {k: int(v) if v else 1 for k,v in re.findall(r"([A-Z][a-z]?)(\d+)?", s)} 
Out[3]: {'A': 15, 'B': 7, 'C': 1, 'D': 2, 'Ef': 5}

正則表達式本質上是你的要求直接翻譯，利用.findall和捕獲組：

r"([A-Z][a-z]?)(\d+)?"

本質上，可以遵循的大寫字母以小寫字母作爲第一組，以及作爲第二組的可能或不可能存在的數字（如果它不在那裏將返回''）。

一個更爲複雜的例子：

In [7]: s = 'A15B7CD2EfFGHK5' 

In [8]: {k: int(v) if v else 1 for k,v in re.findall(r"([A-Z][a-z]?)(\d+)?", s)} 
Out[8]: {'A': 15, 'B': 7, 'C': 1, 'D': 2, 'Ef': 1, 'F': 1, 'G': 1, 'H': 1, 'K': 5}

最後，將它分解有更棘手的例子：

In [10]: s = 'A15B7CD2EfFGgHHhK5' 

In [11]: re.findall(r"([A-Z](?:[a-z])?)(\d+)?", s) 
Out[11]: 
[('A', '15'), 
('B', '7'), 
('C', ''), 
('D', '2'), 
('Ef', ''), 
('F', ''), 
('Gg', ''), 
('H', ''), 
('Hh', ''), 
('K', '5')] 

In [12]: {k: int(v) if v else 1 for k,v in re.findall(r"([A-Z][a-z]?)(\d+)?", s)} 
Out[12]: 
{'A': 15, 
'B': 7, 
'C': 1, 
'D': 2, 
'Ef': 1, 
'F': 1, 
'Gg': 1, 
'H': 1, 
'Hh': 1, 
'K': 5}

來源

2017-06-21 18:21:02

@Jan啊是的，謝謝你的更正。 –

單線完整解決方案非常棒！非常感謝：） – holmes840

搜索字符串中的字母而不是數字。

import re 
data_str = 'A15B7C2' 
temp = re.split("([A-Za-z])", data_str)[1:] # First element is just "", don want that 
temp= [a if a != "" else "1" for a in temp] # add the 1's that were implicit in the original string 
finalDict = dict(zip(temp[0::2], temp[1::2])) # turn the list into a dict

來源

2017-06-21 18:16:53 jacoblaw

你可以使用正則表達式的一些邏輯和.span()：

([A-Z])[a-z]*(\d+)

見a demo on regex101.com。

在 Python這將是：

import re 

string = "A15B7CD2Ef5" 
rx = re.compile(r'([A-Z])[a-z]*(\d+)') 

def analyze(string=None): 
    result = []; lastpos = 0; 
    for m in rx.finditer(string): 
     span = m.span() 
     if lastpos != span[0]: 
      result.append((string[lastpos], 1)) 
     else: 
      result.append((m.group(1), m.group(2))) 
     lastpos = span[1] 
    return result 

print(analyze(string)) 
# [('A', '15'), ('B', '7'), ('C', 1), ('E', '5')]

來源

2017-06-21 18:20:24 Jan

與原來的邏輯是一致的。而不是使用re。split（）我們可以找到所有的數字，在第一次匹配時分割字符串，爲下一個分割保留字符串的後半部分，並將對存儲爲元組以備後用。

import re 

raw = "A15B7CD2Ef5" 
# find all the numbers 
found = re.findall("(\d+)", raw) 
# save the pairs as a list of tuples 
pairs = [] 
# check that numbers where found 
if found: 
    # iterate over all matches 
    for f in found: 
     # split the raw, with a max split of one, so that duplicate numbers don't cause more then 2 parts 
     part = raw.split(f, 1) 
     # set the original string to the second half of the split 
     raw = part[1] 
     # append pair 
     pairs.append((part[0], f)) 



# Now for fun expand values 
long_str = "" 
for p in pairs: 
    long_str += p[0] * int(p[1]) 

print pairs 
print long_str

來源

2017-06-21 18:27:38 reticentroot

字符串分割到字符和數字並存儲在地圖的Python

回答

相關問題