pyparsing查詢化學元素數據庫

我想分析查詢化學元素數據庫。pyparsing查詢化學元素數據庫

數據庫存儲在一個xml文件中。解析該文件會生成一個嵌套的字典，該字典存儲在從collections.OrderedDict繼承的單例對象中。（即ELEMENTS ['C'] - > {'name'：'carbon'，'neutron'：0，'proton'：6）這個元素將給我一個有序的字典。，...}）。相反，要求一個propery會給我一個有序的所有元素值的字典（即ELEMENTS ['proton'] - > {'H'：1，'He'：2} ...）。）。

典型的查詢可以是：

mass > 10 or (nucleon < 20 and atomic_radius < 5)

，其中每個「子查詢」（即質量> 10）將返回與其匹配所述一組元素。

然後，查詢將被轉換並在內部轉換爲一個字符串，該字符串將被進一步評估以產生一組與其匹配的元素的索引。在這種情況下，運算符和/或不是布爾運算符，而是作用於python集合的集合運算符。

我最近發了一篇文章來構建這樣的查詢。感謝我得到的有用答案，我認爲我做了或多或少的工作（我希望有一個很好的方法！），但我仍然有一些與pyparsing相關的問題。

這裏是我的代碼：

import numpy 

from pyparsing import * 

# This import a singleton object storing the datase dictionary as 
# described earlier 
from ElementsDatabase import ELEMENTS 

and_operator = oneOf(['and','&'], caseless=True) 
or_operator = oneOf(['or' ,'|'], caseless=True) 

# ELEMENTS.properties is a property getter that returns the list of 
# registered properties in the database 
props = oneOf(ELEMENTS.properties, caseless=True) 

# A property keyword can be quoted or not. 
props = Suppress('"') + props + Suppress('"') | props 
# When parsed, it must be replaced by the following expression that 
# will be eval later. 
props.setParseAction(lambda t : "numpy.array(ELEMENTS['%s'].values())" % t[0].lower()) 

quote = QuotedString('"') 
integer = Regex(r'[+-]?\d+').setParseAction(lambda t:int(t[0])) 
float_ = Regex(r'[+-]?(\d+(\.\d*)?)?([eE][+-]?\d+)?').setParseAction(lambda t:float(t[0])) 

comparison_operator = oneOf(['==','!=','>','>=','<', '<=']) 
comparison_expr = props + comparison_operator + (quote | float_ | integer) 
comparison_expr.setParseAction(lambda t : "set(numpy.where(%s)%s%s)" % tuple(t)) 

grammar = Combine(operatorPrecedence(comparison_expr, [(and_operator, 2, opAssoc.LEFT) (or_operator, 2, opAssoc.LEFT)])) 

# A test query 
res = grammar.parseString('"mass  " > 30 or (nucleon == 1)',parseAll=True) 

print eval(' '.join(res._asStringList()))

我的問題有以下幾點：

1 using 'transformString' instead of 'parseString' never triggers any 
    exception even when the string to be parsed does not match the grammar. 
    However, it is exactly the functionnality I need. Is there is a way to do so ? 

2 I would like to reintroduce white spaces between my tokens in order 
that my eval does not fail. The only way I found to do so it the one 
implemented above. Would you see a better way using pyparsing ?

遺憾的長期職位，但我想在更深的詳細介紹它的上下文。順便說一句，如果你發現這種方法不好，不要猶豫，告訴我！

非常感謝您的幫助。

埃裏克

來源

2012-11-02 Eurydice

不用擔心我的關心，我發現周圍的工作。我使用了pyparsing附帶的SimpleBool.py示例（感謝提示Paul）。

基本上，我用下面的方法：

1 for each subquery (i.e. mass > 10), using the setParseAction method, 
I joined a function that returns the set of eleements that matched 
the subquery 

2 then, I joined the following functions for each logical operator (and, 
or and not): 

def not_operator(token): 

    _, s = token[0] 

    # ELEMENTS is the singleton described in my original post 
    return set(ELEMENTS.keys()).difference(s) 

def and_operator(token): 

    s1, _, s2 = token[0] 

    return (s1 and s2) 

def or_operator(token): 

    s1, _, s2 = token[0] 

    return (s1 or s2) 

# Thanks for Paul for the hint. 
grammar = operatorPrecedence(comparison_expr, 
      [(not_token, 1,opAssoc.RIGHT,not_operator), 
      (and_token, 2, opAssoc.LEFT,and_operator), 
      (or_token, 2, opAssoc.LEFT,or_operator)]) 

Please not that these operators acts upon python sets rather than 
on booleans.

而且，沒有工作。

我希望這種方法能幫助你們任何人。

埃裏克

來源

2012-11-03 15:12:13 Eurydice

幹得好！很高興你能夠解決這個問題。這與「Pyparsing入門」中的搜索引擎查詢解析器非常相似 – PaulMcG

哦，順便說一句 - 如果您嘗試「cond_a和cond_b和cond_c」，那麼您將得到'[[cond_a，'和'，cond_b ，'和'，cond_c]]'傳遞給你的分析動作。處理這種情況的最簡單方法是使用切片：將and_operator更改爲'return all（token [0] [:: 2]）'和or_operator以'return any（token [0] [:: 2]）''。 – PaulMcG

pyparsing查詢化學元素數據庫

回答

相關問題