我想分析查詢化學元素數據庫。pyparsing查詢化學元素數據庫
數據庫存儲在一個xml文件中。解析該文件會生成一個嵌套的字典,該字典存儲在從collections.OrderedDict繼承的單例對象中。 (即ELEMENTS ['C'] - > {'name':'carbon','neutron':0,'proton':6)這個元素將給我一個有序的字典。 ,...})。相反,要求一個propery會給我一個有序的所有元素值的字典(即ELEMENTS ['proton'] - > {'H':1,'He':2} ...)。 )。
典型的查詢可以是:
mass > 10 or (nucleon < 20 and atomic_radius < 5)
,其中每個「子查詢」(即質量> 10)將返回與其匹配所述一組元素。
然後,查詢將被轉換並在內部轉換爲一個字符串,該字符串將被進一步評估以產生一組與其匹配的元素的索引。在這種情況下,運算符和/或不是布爾運算符,而是作用於python集合的集合運算符。
我最近發了一篇文章來構建這樣的查詢。感謝我得到的有用答案,我認爲我做了或多或少的工作(我希望有一個很好的方法!),但我仍然有一些與pyparsing相關的問題。
這裏是我的代碼:
import numpy
from pyparsing import *
# This import a singleton object storing the datase dictionary as
# described earlier
from ElementsDatabase import ELEMENTS
and_operator = oneOf(['and','&'], caseless=True)
or_operator = oneOf(['or' ,'|'], caseless=True)
# ELEMENTS.properties is a property getter that returns the list of
# registered properties in the database
props = oneOf(ELEMENTS.properties, caseless=True)
# A property keyword can be quoted or not.
props = Suppress('"') + props + Suppress('"') | props
# When parsed, it must be replaced by the following expression that
# will be eval later.
props.setParseAction(lambda t : "numpy.array(ELEMENTS['%s'].values())" % t[0].lower())
quote = QuotedString('"')
integer = Regex(r'[+-]?\d+').setParseAction(lambda t:int(t[0]))
float_ = Regex(r'[+-]?(\d+(\.\d*)?)?([eE][+-]?\d+)?').setParseAction(lambda t:float(t[0]))
comparison_operator = oneOf(['==','!=','>','>=','<', '<='])
comparison_expr = props + comparison_operator + (quote | float_ | integer)
comparison_expr.setParseAction(lambda t : "set(numpy.where(%s)%s%s)" % tuple(t))
grammar = Combine(operatorPrecedence(comparison_expr, [(and_operator, 2, opAssoc.LEFT) (or_operator, 2, opAssoc.LEFT)]))
# A test query
res = grammar.parseString('"mass " > 30 or (nucleon == 1)',parseAll=True)
print eval(' '.join(res._asStringList()))
我的問題有以下幾點:
1 using 'transformString' instead of 'parseString' never triggers any
exception even when the string to be parsed does not match the grammar.
However, it is exactly the functionnality I need. Is there is a way to do so ?
2 I would like to reintroduce white spaces between my tokens in order
that my eval does not fail. The only way I found to do so it the one
implemented above. Would you see a better way using pyparsing ?
遺憾的長期職位,但我想在更深的詳細介紹它的上下文。順便說一句,如果你發現這種方法不好,不要猶豫,告訴我!
非常感謝您的幫助。
埃裏克
幹得好!很高興你能夠解決這個問題。這與「Pyparsing入門」中的搜索引擎查詢解析器非常相似 – PaulMcG
哦,順便說一句 - 如果您嘗試「cond_a和cond_b和cond_c」,那麼您將得到'[[cond_a,'和',cond_b ,'和',cond_c]]'傳遞給你的分析動作。處理這種情況的最簡單方法是使用切片:將and_operator更改爲'return all(token [0] [:: 2])'和or_operator以'return any(token [0] [:: 2])''。 – PaulMcG