如果你想有更多的靈活性,在你自己的語法,這裏是數據定義一個簡單的解析器,你已經給:
data = """\
a = [-vegetable, +fruit, +apple, -orange, -citrus]
o = [-vegetable, +fruit, -apple, +orange, +citrus]
t = [+vegetable, -fruit]"""
from pyparsing import Word, alphas, oneOf, Group, delimitedList
# a basic token for a word of alpha characters plus underscores
ident = Word(alphas + '_')
# define a token for leading '+' or '-', with parse action to convert to bool value
inclFlag = oneOf('+ -')
inclFlag.setParseAction(lambda t: t[0] == '+')
# define a feature as the combination of an inclFlag and a feature name
feature = Group(inclFlag('has') + ident('feature'))
# define a definition
defn = ident('name') + '=' + '[' + delimitedList(feature)('features') + ']'
# search through the input test data for defns, and print out the parsed data
# by name, and the associated features
defns = defn.searchString(data)
for d in defns:
print d.dump()
for f in d.features:
print f.dump(' ')
print
打印:
['a', '=', '[', [False, 'vegetable'], [True, 'fruit'], [True, 'apple'], [False, 'orange'], [False, 'citrus'], ']']
- features: [[False, 'vegetable'], [True, 'fruit'], [True, 'apple'], [False, 'orange'], [False, 'citrus']]
- name: a
[False, 'vegetable']
- feature: vegetable
- has: False
[True, 'fruit']
- feature: fruit
- has: True
[True, 'apple']
- feature: apple
- has: True
[False, 'orange']
- feature: orange
- has: False
[False, 'citrus']
- feature: citrus
- has: False
['o', '=', '[', [False, 'vegetable'], [True, 'fruit'], [False, 'apple'], [True, 'orange'], [True, 'citrus'], ']']
- features: [[False, 'vegetable'], [True, 'fruit'], [False, 'apple'], [True, 'orange'], [True, 'citrus']]
- name: o
[False, 'vegetable']
- feature: vegetable
- has: False
[True, 'fruit']
- feature: fruit
- has: True
[False, 'apple']
- feature: apple
- has: False
[True, 'orange']
- feature: orange
- has: True
[True, 'citrus']
- feature: citrus
- has: True
['t', '=', '[', [True, 'vegetable'], [False, 'fruit'], ']']
- features: [[True, 'vegetable'], [False, 'fruit']]
- name: t
[True, 'vegetable']
- feature: vegetable
- has: True
[False, 'fruit']
- feature: fruit
- has: False
Pyparsing爲你做了很多開銷,比如迭代輸入字符串,跳過不相關的空白,並且使用命名屬性返回解析的數據。查看pyparsing wiki(SimpleBool.py)中的布爾評估器,或更完整的布爾評估器軟件包booleano。
除非你想解析用自然語言提出的查詢,否則你不需要NLTK。即使允許任意嵌套,解析布爾表達式對於大多數解析技術來說也是相當簡單的(對於一些解析技術幾乎是微不足道的)。 – delnan
我做。我試圖爲音系約束排名做一個最優性理論的eval()函數。 – Pygmalion
如果't'代表'西紅柿',這實際上是一種水果,至少是植物性的。 http://oxforddictionaries.com/page/tomatofruitveg – PaulMcG