2013-12-13 98 views
4

我有一個預處理的C文件,我需要枚舉其中一個枚舉的成員。 pyparsing附帶一個簡單的例子(examples/cpp_enum_parser.py),但它只適用於枚舉值是正整數。在現實生活中,價值可能是負面的,十六進制或複雜的表達。是否可以使用pyparsing解析非平凡的C枚舉?

我不需要結構化的值,只是名稱。

enum hello { 
    minusone=-1, 
    par1 = ((0,5)), 
    par2 = sizeof("a\\")bc};,"), 
    par3 = (')') 
}; 

當解析值,解析器應該跳過一切,直到[('",}]和處理這些字符。對於那個正則表達式或SkipTo可能會有用。對於字符串和字符 - QuotedString。對於嵌套括號 - 正向(examples/fourFn.py

回答

3

修改了原始示例。我不知道他們爲什麼從原始腳本中刪除enum.ignore(cppStyleComment)。把它放回去。

from pyparsing import * 
# sample string with enums and other stuff 
sample = ''' 
    stuff before 
    enum hello { 
     Zero, 
     One, 
     Two, 
     Three, 
     Five=5, 
     Six, 
     Ten=10, 
     minusone=-1, 
     par1 = ((0,5)), 
     par2 = sizeof("a\\")bc};,"), 
     par3 = (')') 
     }; 
    in the middle 
    enum 
     { 
     alpha, 
     beta, 
     gamma = 10 , 
     zeta = 50 
     }; 
    at the end 
    ''' 

# syntax we don't want to see in the final parse tree 
LBRACE,RBRACE,EQ,COMMA = map(Suppress,"{}=,") 


lpar = Literal("(") 
rpar = Literal(")") 
anything_topl = Regex(r"[^'\"(,}]+") 
anything  = Regex(r"[^'\"()]+") 

expr = Forward() 
pths_or_str = quotedString | lpar + expr + rpar 
expr <<  ZeroOrMore(pths_or_str | anything) 
expr_topl = ZeroOrMore(pths_or_str | anything_topl) 

_enum = Suppress('enum') 
identifier = Word(alphas,alphanums+'_') 
expr_topl_text = originalTextFor(expr_topl) 
enumValue = Group(identifier('name') + Optional(EQ + expr_topl_text('value'))) 
enumList = Group(ZeroOrMore(enumValue + COMMA) + Optional(enumValue)) 
enum = _enum + Optional(identifier('enum')) + LBRACE + enumList('names') + RBRACE 
enum.ignore(cppStyleComment) 

# find instances of enums ignoring other syntax 
for item,start,stop in enum.scanString(sample): 
    for entry in item.names: 
     print('%s %s = %s' % (item.enum,entry.name, entry.value)) 

結果:

$ python examples/cpp_enum_parser.py 
hello Zero = 
hello One = 
hello Two = 
hello Three = 
hello Five = 5 
hello Six = 
hello Ten = 10 
hello minusone = -1 
hello par1 = ((0,5)) 
hello par2 = sizeof("a\")bc};,") 
hello par3 = (')') 
alpha = 
beta = 
gamma = 10 
zeta = 50 
3

你必須特殊情況下,可能包含逗號或右括號不標記枚舉值的最終條款。

from pyparsing import * 

sample = r""" 
enum hello { 
    minusone=-1, 
    par1 = ((0,5)), 
    par2 = sizeof("a\")bc};,"), 
    par3 = (')') 
}; 
""" 

ENUM = Keyword("enum") 
LBRACE,RBRACE,COMMA,EQ = map(Suppress, "{},=") 
identifier = Word(alphas+"_", alphanums+"_") 
identifier.setName("identifier")#.setDebug() 

funcCall = identifier + nestedExpr() 

enum_value = nestedExpr() | quotedString | funcCall | SkipTo(COMMA | RBRACE) 

enum_decl = (ENUM + Optional(identifier, '')("ident") + LBRACE + 
    OneOrMore(identifier + Optional(EQ + enum_value).suppress() + Optional(COMMA))("names") + 
    RBRACE 
    ) 

for enum in enum_decl.searchString(sample): 
    print enum.ident, ','.join(enum.names) 

打印

hello minusone,par1,par2,par3 
+0

沒有注意到nestedExpr。謝謝 – basin