Python的正則表達式，ASCII轉義字符標籤

我有以下的Xterm的輸出：Python的正則表達式，ASCII轉義字符標籤

text = '\x1b[0m\x1b[01;32mattr\x1b[0m\n\x1b[01;36mawk\x1b[0m\n\x1b[01;32mbasename\x1b[0m\n\x1b[01;32mbash\n\x1b[0many text'

我知道\x1b[0m是刪除所有文字屬性和\x1b[01如果大膽的文字，\x1b[32m是綠色的文字和\x1b[01;32m是大膽的綠色文本。那麼我怎樣才能將這些轉義字符傳遞給我自己的標籤呢？就像這樣：

我想，我的text變量變成這樣：

text = '<bold><green>attr</bold></green>\n<bold><cyan>awk</bold></cyan>\n<bold><green>basename</bold></green>\n<bold><green>bash</bold></green>\nanytext'

來源

2016-12-14 Caaarlos

import re 

text = '\x1b[0m\x1b[01;32mattr\x1b[0m\n\x1b[01;36mawk\x1b[0m\n\x1b[01;32mbasename\x1b[0m\n\x1b[01;32mbash\n\x1b[0many text' 

# dictionary mapping text attributes to tag names 
fmt = {'01':'bold', '32m':'green', '36m': 'cyan'} 
# regex that gets all text attributes, the text and any potential newline 
groups = re.findall('(\n?)\\x1b\[((?:(?:0m|32m|01|36m);?)+)([a-zA-Z ]+)', text) 
# iterate through the groups and build your new string 
xml = [] 
for group in groups: 
    g_text = group[2] # the text itself 
    for tag in group[1].split(';'): # the text attributes 
     if tag in fmt: 
      tag = fmt[tag] 
     else: 
      continue 
     g_text = '<%s>%s</%s>' %(tag,g_text,tag) 
    g_text = group[0] + g_text # add a newline if necessary 
    xml.append(g_text) 
xml_text = ''.join(xml) 

print(xml_text) 

<green><bold>attr</bold></green> 
<cyan><bold>awk</bold></cyan> 
<green><bold>basename</bold></green> 
<green><bold>bash</bold></green> 
any text

有關正則表達式的一個演示中看到這個鏈接：Debuggex Demo

目前正則表達式假定您在實際文本中只有字母字符或空格，但可以隨意在正則表達式的末尾更改此組([a-zA-Z ]+)以包含其他字符你可能在你的文本中有。

此外，我假設你有更多的文字屬性比粗體，綠色和青色。您需要使用其他屬性及其映射來更新fmt字典。

編輯

@Caaarlos'已請求在評論中（如下圖），以保持ANSI代碼是輸出，如果它沒有出現在fmt詞典：

import re 

text = '\x1b[0m\x1b[01;32;35mattr\x1b[0;7m\n\x1b[01;36mawk\x1b[0m\n\x1b[01;32;47mbasename\x1b[0m\n\x1b[01;32mbash\n\x1b[0many text' 

fmt = {'01':'bold', '32':'green', '36': 'cyan'} 

xml = [] 
active_tags = [] 
for group in re.split('\x1b\[', text): 
    if group.strip(): 
     codes, text = re.split('((?:\d+;?)+)m', group)[1:] 
     not_found = [] 
     for tag in codes.split(';'): 
      if tag in fmt: 
       tag = fmt[tag] 
       text = '<%s>%s' %(tag,text) 
       active_tags.append(tag) 
      elif tag == '0': 
       for a_tag in active_tags[::-1]: 
        text = '</%s>%s' %(a_tag,text) 
       active_tags = [] 
      else: 
       not_found.append(tag) 
     if not_found: 
      text = '\x1b[%sm%s' %(';'.join(not_found), text) 
     xml.append(text) 
xml_text = ''.join(xml) 

print(repr(xml_text)) 

'\x1b[35m<green><bold>attr\x1b[7m</bold></green>\n<cyan><bold>awk</bold></cyan>\n\x1b[47m<green><bold>basename</bold></green>\n<green><bold>bash\n</bold></green>any text'

請注意，上面編輯的代碼還處理標籤在文本之後沒有直接關閉的情況。

來源

2016-12-14 15:51:12 bunji

謝謝Bunji，它工作正常。但是，當我在fmt上沒有模式時，我希望我的文本變量繼續具有ascii的轉義序列（調試目的）。舉例來說，如果不是 '36米' 的格局，我想我的最終輸出是： ATTR \ X1B [01; 36mawk 基名慶典任何文本 – Caaarlos

@Caaarlos，答案已經更新了一個處理fmt中找不到的代碼的編輯。此外，它處理標籤不應該在文本之後直接關閉的情況。 – bunji

Python的正則表達式，ASCII轉義字符標籤

回答

相關問題