關於Using Regular Expressions for Lexical Analysis在effbot.org有一篇很好的文章。
適應記號賦予你的問題:
import re
token_pattern = r"""
(?P<identifier>[a-zA-Z_][a-zA-Z0-9_]*)
|(?P<integer>[0-9]+)
|(?P<dot>\.)
|(?P<open_variable>[$][{])
|(?P<open_curly>[{])
|(?P<close_curly>[}])
|(?P<newline>\n)
|(?P<whitespace>\s+)
|(?P<equals>[=])
|(?P<slash>[/])
"""
token_re = re.compile(token_pattern, re.VERBOSE)
class TokenizerException(Exception): pass
def tokenize(text):
pos = 0
while True:
m = token_re.match(text, pos)
if not m: break
pos = m.end()
tokname = m.lastgroup
tokvalue = m.group(tokname)
yield tokname, tokvalue
if pos != len(text):
raise TokenizerException('tokenizer stopped at pos %r of %r' % (
pos, len(text)))
爲了測試它,我們這樣做:
stuff = r'property.${general.name}.ip = ${general.ip}'
stuff2 = r'''
general {
name = myname
ip = 127.0.0.1
}
'''
print ' stuff '.center(60, '=')
for tok in tokenize(stuff):
print tok
print ' stuff2 '.center(60, '=')
for tok in tokenize(stuff2):
print tok
爲:
========================== stuff ===========================
('identifier', 'property')
('dot', '.')
('open_variable', '${')
('identifier', 'general')
('dot', '.')
('identifier', 'name')
('close_curly', '}')
('dot', '.')
('identifier', 'ip')
('whitespace', ' ')
('equals', '=')
('whitespace', ' ')
('open_variable', '${')
('identifier', 'general')
('dot', '.')
('identifier', 'ip')
('close_curly', '}')
========================== stuff2 ==========================
('newline', '\n')
('identifier', 'general')
('whitespace', ' ')
('open_curly', '{')
('newline', '\n')
('whitespace', ' ')
('identifier', 'name')
('whitespace', ' ')
('equals', '=')
('whitespace', ' ')
('identifier', 'myname')
('newline', '\n')
('whitespace', ' ')
('identifier', 'ip')
('whitespace', ' ')
('equals', '=')
('whitespace', ' ')
('integer', '127')
('dot', '.')
('integer', '0')
('dot', '.')
('integer', '0')
('dot', '.')
('integer', '1')
('newline', '\n')
('close_curly', '}')
('newline', '\n')
爲什麼不使用的,而不是創建自己的解析器JSON ?? – AndiDog 2010-03-01 20:39:12
您的示例翻譯似乎有一些錯誤。如果沒有,我不明白爲什麼在示例的第3行中將「$ {component1} .ip」轉換爲「component1」。 如果語法是正則表達式,我可能會用正則表達式翻譯$ {identifiers},並用沒有字典條目的字典查找來替換它們。 – msw 2010-03-01 20:42:15
那裏有一些錯誤,我想我已經糾正了它們。 – 2010-03-01 20:45:26