2015-11-04 52 views
1

我需要將嵌套的二進制布爾表達式解析到XML樹中。例如採取表達式Pyparsing/Python二進制布爾表達式到XML嵌套問題(2.7.10)

expression2 = "((Param1 = 1 AND Param2 = 1) \ 
      OR (Param3 = 1 AND Param4 = 1)) \ 
      AND \ 
      (((Param5 = 0 AND Param6 = 1) \ 
      OR(Param7 = 0 AND Param8 = 1)) \ 
      AND \ 
      ((Param9 = 0 AND Param10 = 1) \ 
      OR(Param11 = 0 AND Param12 = 1)))" 

這實質上是(Expression) (Operator) (Expression)條款的組合。

我需要輸出是這些表達式與XML中適當標籤的組合。又名

<MainBody> 
      <FirstExpression> 
      Parameter 
      </FirstExpression> 
      <Operator>=</Operator> 
      <SecondExpression> 
      1 
      </SecondExpression> 
     </MainBody> 

其中firstexpression可以是一個參數或型主體(這裏是嵌套),操作者總是=,<,>,AND,OR,以及secondexpression或者是一個整數或型主體

總是會有三個組 - 即最小的離散對象將由第一個表達式和第二個表達式組成。

我提出的代碼(這是我第一次使用python)讓我有點在那裏。

import pyparsing as pp 
import xml.etree.ElementTree as ET 


operator = pp.Regex(">=|<=|!=|>|<|=").setName("operator").setResultsName("Operator") 
number = pp.Regex(r"[+-]?\d+(:?\.\d*)?(:?[eE][+-]?\d+)?").setResultsName("SecondExpression") 
identifier = pp.Word(pp.alphas, pp.alphanums + "_" + ".").setName("FirstExpression").setResultsName("FirstExpression") 
comparison_term = identifier | number 
condition = pp.Group(comparison_term + operator + comparison_term).setResultsName("MainBody") 


expr = pp.operatorPrecedence(condition,[ 
          ("NOT", 1, pp.opAssoc.RIGHT,), 
          ("AND", 2, pp.opAssoc.LEFT,), 
          ("OR", 2, pp.opAssoc.LEFT,), 
          ]) 


expression2 = "((Param1 = 1 AND Param2 = 1) \ 
       OR (Param3 = 1 AND Param4 = 1)) \ 
       AND \ 
       (((Param5 = 0 AND Param6 = 1) \ 
       OR(Param7 = 0 AND Param8 = 1)) \ 
       AND \ 
       ((Param9 = 0 AND Param10 = 1) \ 
       OR(Param11 = 0 AND Param12 = 1)))" 



out = expr.parseString(expression2) 
text = out.asXML() 

f = open('rules.xml','w+') 
f.write(text) 
f.close() 

root = ET.parse("rules.xml").getroot() 

print ET.tostring(root) 

這種輸出這種形式的XML:

<ITEM> 
    <ITEM> 
    <ITEM> 
     <MainBody> 
     <MainBody> 
      <FirstExpression>Param1</FirstExpression> 
      <Operator>=</Operator> 
      <SecondExpression>1</SecondExpression> 
     </MainBody> 
     <ITEM>AND</ITEM> 
     <MainBody> 
      <FirstExpression>Param2</FirstExpression> 
      <Operator>=</Operator> 
      <SecondExpression>1</SecondExpression> 
     </MainBody> 
     </MainBody> 
     <ITEM>OR</ITEM> 
     <MainBody> 
     <MainBody> 
      <FirstExpression>Param3</FirstExpression> 
      <Operator>=</Operator> 
      <SecondExpression>1</SecondExpression> 
     </MainBody> 
     <ITEM>AND</ITEM> 
     <MainBody> 
      <FirstExpression>Param4</FirstExpression> 
      <Operator>=</Operator> 
      <SecondExpression>1</SecondExpression> 
     </MainBody> 
     </MainBody> 
    </ITEM> 
    <ITEM>AND</ITEM> 
    <ITEM> 
     <ITEM> 
     <MainBody> 
      <MainBody> 
      <FirstExpression>Param5</FirstExpression> 
      <Operator>=</Operator> 
      <SecondExpression>0</SecondExpression> 
      </MainBody> 
      <ITEM>AND</ITEM> 
      <MainBody> 
      <FirstExpression>Param6</FirstExpression> 
      <Operator>=</Operator> 
      <SecondExpression>1</SecondExpression> 
      </MainBody> 
     </MainBody> 
     <ITEM>OR</ITEM> 
     <MainBody> 
      <MainBody> 
      <FirstExpression>Param7</FirstExpression> 
      <Operator>=</Operator> 
      <SecondExpression>0</SecondExpression> 
      </MainBody> 
      <ITEM>AND</ITEM> 
      <MainBody> 
      <FirstExpression>Param8</FirstExpression> 
      <Operator>=</Operator> 
      <SecondExpression>1</SecondExpression> 
      </MainBody> 
     </MainBody> 
     </ITEM> 
     <ITEM>AND</ITEM> 
     <ITEM> 
     <MainBody> 
      <MainBody> 
      <FirstExpression>Param9</FirstExpression> 
      <Operator>=</Operator> 
      <SecondExpression>0</SecondExpression> 
      </MainBody> 
      <ITEM>AND</ITEM> 
      <MainBody> 
      <FirstExpression>Param10</FirstExpression> 
      <Operator>=</Operator> 
      <SecondExpression>1</SecondExpression> 
      </MainBody> 
     </MainBody> 
     <ITEM>OR</ITEM> 
     <MainBody> 
      <MainBody> 
      <FirstExpression>Param11</FirstExpression> 
      <Operator>=</Operator> 
      <SecondExpression>0</SecondExpression> 
      </MainBody> 
      <ITEM>AND</ITEM> 
      <MainBody> 
      <FirstExpression>Param12</FirstExpression> 
      <Operator>=</Operator> 
      <SecondExpression>1</SecondExpression> 
      </MainBody> 
     </MainBody> 
     </ITEM> 
    </ITEM> 
    </ITEM> 
</ITEM> 

顯然,這不是要我要與標籤的唯一對象是在最深層次。我需要它比規則更大的規則所需的深度 - 實質上是一個包含主體,第一表達式,運算符和第二表達式集合的二叉樹。

我還需要在標籤內放置整數值,這是另一件事情,我還沒有弄清楚該怎麼做。

我認爲pyparsing應該能夠以某種方式對組進行此操作,但我無法弄清楚。

任何人都可以提供如何實現這一建議?

感謝

編輯15年11月5日:

大廈關閉的保羅寫道:我在這段代碼帶着一個(也意欲)遞歸語法:

import pyparsing as pp 


operator = pp.oneOf(">= <= != > < =")("operator") 
integer = pp.Regex(r"[+-]?\d+(:?\.\d*)?(:?[eE][+-]?\d+)?")("integer") 
parameter = pp.Word(pp.alphas, pp.alphanums + "_" + "." + "-")("parameter") 
comparison_term = parameter | integer 

firstExpression = pp.Forward() 
secondExpression = pp.Forward() 

mainbody = pp.Group(firstExpression + operator + secondExpression)("Mainbody") 

firstExpression << pp.Group(parameter | pp.Optional(mainbody))("FirstExpression") 
secondExpression << pp.Group(integer | pp.Optional(mainbody))("SecondExpression") 

AND_ = pp.Keyword("AND")("operator") 
OR_ = pp.Keyword("OR")("operator") 
NOT_ = pp.Keyword("NOT")("operator") 

expr = pp.operatorPrecedence(mainbody,[ 
          (NOT_, 1, pp.opAssoc.RIGHT,), 
          (AND_, 2, pp.opAssoc.LEFT,), 
          (OR_, 2, pp.opAssoc.LEFT,), 
          ]) 

# undocumented hack to assign a results name to (expr) - RED FLAG 
expr.expr.resultsName = "Mainbody" 

expression1 = "((Param1 = 1) \ 
       OR (Param2 = 1))" 

out = expr.parseString(expression1)[0] # extract item 0 from single-item list 
text = out.asXML("Mainbody") # add tag for outermost element 
print text 

將無限遞歸。更改| to +在第一個表達式和第二個表達式行修復了這一點,但我相信它會導致解析器永遠不會查找主體進行分組。

我已經包含了一個簡化的規則,所以我可以顯示我想要得到的確切輸出。

此代碼生成:

<Mainbody> 
    <Mainbody> 
    <FirstExpression> 
     <parameter>Param1</parameter> 
    </FirstExpression> 
    <operator>=</operator> 
    <SecondExpression> 
     <integer>1</integer> 
    </SecondExpression> 
    </Mainbody> 
    <operator>OR</operator> 
    <Mainbody> 
    <FirstExpression> 
     <parameter>Param2</parameter> 
    </FirstExpression> 
    <operator>=</operator> 
    <SecondExpression> 
     <integer>1</integer> 
    </SecondExpression> 
    </Mainbody> 
</Mainbody> 

我試圖讓

<Mainbody> 
    <FirstExpression> 
    <Mainbody> 
     <FirstExpression> 
     <parameter>Param1</parameter> 
     </FirstExpression> 
     <operator>=</operator> 
     <SecondExpression> 
     <integer>1</integer> 
     </SecondExpression> 
    </Mainbody> 
    </FirstExpression> 
    <operator>OR</operator> 
    <SecondExpression> 
    <Mainbody> 
     <FirstExpression> 
     <parameter>Param2</parameter> 
     </FirstExpression> 
     <operator>=</operator> 
     <SecondExpression> 
     <integer>1</integer> 
     </SecondExpression> 
    </Mainbody> 
    </SecondExpression> 
    </Mainbody> 

它看起來我看到的問題是解析器沒有正確標註/識別/分組作爲FirstExpression或SecondExpression的主體。我試着調整語法,經常得到無限遞歸,所以我有一種感覺,我的語法定義出了問題。我需要通過AND/OR來處理任意數量的二進制分組(PARAMETER = INTEGER)。

有什麼建議嗎?

感謝

回答

0

這裏只有幾個變化代碼:

  • 變「與」,「或」,而「不是」關鍵詞表達式,以「經營者」的結果名稱,因此,他們將得到包裹在<operator>標籤
  • operatorPrecedence創建的expr的內部表達(這是最近更名爲infixNotation
  • 提取第0電子商務亂砍結果名字元素從單項目列表從parseString
  • 在調用asXML

添加一個最外邊的標籤名稱返回。

operator = pp.oneOf(">= <= != > < =")("Operator") 
number = pp.Regex(r"[+-]?\d+(:?\.\d*)?(:?[eE][+-]?\d+)?")("SecondExpression") 
identifier = pp.Word(pp.alphas, pp.alphanums + "_" + ".")("FirstExpression") 
comparison_term = identifier | number 
condition = pp.Group(comparison_term + operator + comparison_term)("MainBody") 

# define AND, OR, and NOT as keywords, with "operator" results names 
AND_ = pp.Keyword("AND")("operator") 
OR_ = pp.Keyword("OR")("operator") 
NOT_ = pp.Keyword("NOT")("operator") 

expr = pp.operatorPrecedence(condition,[ 
          (NOT_, 1, pp.opAssoc.RIGHT,), 
          (AND_, 2, pp.opAssoc.LEFT,), 
          (OR_, 2, pp.opAssoc.LEFT,), 
          ]) 

# undocumented hack to assign a results name to (expr) - RED FLAG 
expr.expr.resultsName = "group" 

expression2 = "((Param1 = 1 AND Param2 = 1) \ 
       OR (Param3 = 1 AND Param4 = 1)) \ 
       AND \ 
       (((Param5 = 0 AND Param6 = 1) \ 
       OR(Param7 = 0 AND Param8 = 1)) \ 
       AND \ 
       ((Param9 = 0 AND Param10 = 1) \ 
       OR(Param11 = 0 AND Param12 = 1)))" 



out = expr.parseString(expression2)[0] # extract item 0 from single-item list 
text = out.asXML("expression") # add tag for outermost element 
print text 

打印:

<expression> 
    <group> 
    <group> 
     <MainBody> 
     <FirstExpression>Param1</FirstExpression> 
     <Operator>=</Operator> 
     <SecondExpression>1</SecondExpression> 
     </MainBody> 
     <operator>AND</operator> 
     <MainBody> 
     <FirstExpression>Param2</FirstExpression> 
     <Operator>=</Operator> 
     <SecondExpression>1</SecondExpression> 
     </MainBody> 
    </group> 
    <operator>OR</operator> 
    <group> 
     <MainBody> 
     <FirstExpression>Param3</FirstExpression> 
     <Operator>=</Operator> 
     <SecondExpression>1</SecondExpression> 
     </MainBody> 
     <operator>AND</operator> 
     <MainBody> 
     <FirstExpression>Param4</FirstExpression> 
     <Operator>=</Operator> 
     <SecondExpression>1</SecondExpression> 
     </MainBody> 
    </group> 
    </group> 
    <operator>AND</operator> 
    <group> 
    <group> 
     <group> 
     <MainBody> 
      <FirstExpression>Param5</FirstExpression> 
      <Operator>=</Operator> 
      <SecondExpression>0</SecondExpression> 
     </MainBody> 
     <operator>AND</operator> 
     <MainBody> 
      <FirstExpression>Param6</FirstExpression> 
      <Operator>=</Operator> 
      <SecondExpression>1</SecondExpression> 
     </MainBody> 
     </group> 
     <operator>OR</operator> 
     <group> 
     <MainBody> 
      <FirstExpression>Param7</FirstExpression> 
      <Operator>=</Operator> 
      <SecondExpression>0</SecondExpression> 
     </MainBody> 
     <operator>AND</operator> 
     <MainBody> 
      <FirstExpression>Param8</FirstExpression> 
      <Operator>=</Operator> 
      <SecondExpression>1</SecondExpression> 
     </MainBody> 
     </group> 
    </group> 
    <operator>AND</operator> 
    <group> 
     <group> 
     <MainBody> 
      <FirstExpression>Param9</FirstExpression> 
      <Operator>=</Operator> 
      <SecondExpression>0</SecondExpression> 
     </MainBody> 
     <operator>AND</operator> 
     <MainBody> 
      <FirstExpression>Param10</FirstExpression> 
      <Operator>=</Operator> 
      <SecondExpression>1</SecondExpression> 
     </MainBody> 
     </group> 
     <operator>OR</operator> 
     <group> 
     <MainBody> 
      <FirstExpression>Param11</FirstExpression> 
      <Operator>=</Operator> 
      <SecondExpression>0</SecondExpression> 
     </MainBody> 
     <operator>AND</operator> 
     <MainBody> 
      <FirstExpression>Param12</FirstExpression> 
      <Operator>=</Operator> 
      <SecondExpression>1</SecondExpression> 
     </MainBody> 
     </group> 
    </group> 
    </group> 
</expression> 

所以,你是絕對正確的軌道上,就這樣去,但我認爲事實是,我們有一個結果的名字攻入內部未公開的成員變量的expr是一個紅旗,這很可能,你很快就會達到你可以用operatorPrecedence做什麼的極限。

您可能必須實現自己的遞歸分析器才能完全控制所有元素和子元素的命名方式。你甚至可能需要實現你自己的版本asXML()來控制你是否獲得中間級別,比如上面顯示的<group>標籤。

+0

我真的很感謝幫助。這絕對讓我走得更遠,但我仍然不在那裏。我試圖實現一個遞歸語法..我已經添加了一些額外的信息,以我的最初問題,除了你的解決方案,我已經嘗試過。 –