我在爲IBM Rhapsody sbs
文件格式構建解析器。但不幸的是,遞歸部分將無法按預期工作。規則pp.Word(pp.printables + " ")
可能是問題,因爲它也匹配;
和{}
。但至少;
也可以是值的一部分。pyparsing遞歸值列表(ibm rhapsody)
import pyparsing as pp
import pprint
TEST = r"""{ foo
- key = bla;
- value = 1243; 1233; 1235;
- _hans = "hammer
time";
- HaMer = 765; 786; 890;
- value = "
#pragma LINK_INFO DERIVATIVE \"mc9s12xs256\"
";
- _mText = 12.11.2015::13:20:0;
- value = "war"; "fist";
- _obacht = "fish,car,button";
- _id = gibml c0d8-4535-898f-968362779e07;
- bam = { boing
- key = bla;
}
{ boing
- key = bla;
}
}
"""
def flat(loc, toks):
if len(toks[0]) == 1:
return toks[0][0]
assignment = pp.Suppress("-") + pp.Word(pp.alphanums + "_") + pp.Suppress("=")
value = pp.OneOrMore(
pp.Group(assignment + (
pp.Group(pp.OneOrMore(
pp.QuotedString('"', escChar="\\", multiline=True) +
pp.Suppress(";"))).setParseAction(flat) |
pp.Word(pp.alphas) + pp.Suppress(";") |
pp.Word(pp.printables + " ")
))
)
expr = pp.Forward()
expr = pp.Suppress("{") + pp.Word(pp.alphas) + (
value | (assignment + expr) | expr
) + pp.Suppress("}")
expr = expr.ignore(pp.pythonStyleComment)
print TEST
pprint.pprint(expr.parseString(TEST).asList())
輸出:
% python prase.py
{ foo
- key = bla;
- value = 1243; 1233; 1235;
- _hans = "hammer
time";
- HaMer = 765; 786; 890;
- value = "
#pragma LINK_INFO DERIVATIVE \"mc9s12xs256\"
";
- _mText = 12.11.2015::13:20:0;
- value = "war"; "fist";
- _obacht = "fish,car,button";
- _id = gibml c0d8-4535-898f-968362779e07;
- bam = { boing
- key = bla;
}
{ boing
- key = bla;
}
}
['foo',
['key', 'bla'],
['value', '1243; 1233; 1235;'],
['_hans', 'hammer\n time'],
['HaMer', '765; 786; 890;'],
['value', '\n #pragma LINK_INFO DERIVATIVE "mc9s12xs256"\n '],
['_mText', '12.11.2015::13:20:0;'],
['value', ['war', 'fist']],
['_obacht', 'fish,car,button'],
['_id', 'gibml c0d8-4535-898f-968362779e07;'],
['bam', '{ boing'],
['key', 'bla']]
TEST中是否存在拼寫錯誤?如果在' - bam {boing etc.}'之後的最後一組是' - something = {boing \ n- key = bla; }'?很難看到這種格式應該是什麼,你有各種各樣的OneOrMore在這裏和那裏拋出。我想如果你先停下來寫BNF,事情會更清楚。 – PaulMcG
另外,我強烈建議不要使用匹配太多的表達式,比如'pp.Word(printables +'')' - 閱讀pyparsing的Word類的最新版本,其中包含'excludeChars'參數,以便如果您確實需要像'Word(除了';')之外的任何可打印的東西',然後寫'Word(printables,excludeChars =';')'。 – PaulMcG
不幸的是,這種格式是正確的。一個真實的例子https://github.com/mansam/exploring-rhapsody/blob/master/LightSwitch/LightSwitch.rpy – delijati