Pyparsing：解析括號

我試圖解析以下行獨特的最長匹配：Pyparsing：解析括號

command(grep -o '(' file.txt) 
command(ls -1)

與pyparsing。這些命令不會延伸到多行。該規則的初步設想是

cmd = "command(" + pp.OneOrMore(pp.Word(pp.printables)) + ")"

但由於pp.printables還包含（且應包含）右括號「）」 pyparsing無法解析命令。我可以強制pyparsing匹配最長的命令字符串，以便後面緊跟一個右括號嗎？

來源

2017-09-01 DangerRanger

看你的問題，我首先創建包含您的示例文本，分析器，並runTests呼叫的小腳本：

import pyparsing as pp 

tests = """\ 
    command(grep -o '(' file.txt) 
    command(ls -1) 
    """ 

cmd = "command(" + pp.OneOrMore(pp.Word(pp.printables)) + ")" 
cmd.runTests(tests)

正如你所說，自終止失敗「）」得到inncluded在OneOrMore reptetition：（runTests在這裏有用，因爲它要麼顯示分析結果，或擺了個標記，語法分析器誤入歧途）

command(grep -o '(' file.txt) 
          ^
FAIL: Expected ")" (at char 29), (line:1, col:30) 

command(ls -1) 
      ^
FAIL: Expected ")" (at char 14), (line:1, col:15)

發生這種情況是因爲pyparsing純粹是從左到右，沒有隱含的前瞻。

最簡單直接的解決方法是從一套printables的是你的話可製成排除「）」：

cmd = "command(" + pp.OneOrMore(pp.Word(pp.printables, excludeChars=")")) + ")"

這一點讓成功的輸出：

command(grep -o '(' file.txt) 
['command(', 'grep', '-o', "'('", 'file.txt', ')'] 

command(ls -1) 
['command(', 'ls', '-1', ')']

但如果我一個不同的測試字符串添加到您的測試：

command(grep -o ')' file.txt)

的')'是錯誤的對於關閉右括號：

command(grep -o ')' file.txt) 
       ^
FAIL: Expected end of text (at char 18), (line:1, col:19)

通常包括的「讀，直到X」多種pyparsing表達式時，我們需要確保引號內的X不被誤解爲實際X.要做到這一點

一種方式是通過尋找引號的字符串前匹配打印的話搶先比賽：

cmd = "command(" + pp.OneOrMore(pp.quotedString | 
           pp.Word(pp.printables, excludeChars=")")) + ")"

現在我們的報價右括號被正確地跨過作爲引用字符串：

command(grep -o ')' file.txt) 
['command(', 'grep', '-o', "')'", 'file.txt', ')']

但仍有許多可能的極端情況，可能絆倒這個解析器，因此它可能是簡單的使用pyparsing SkipTo表達：

cmd = "command(" + pp.SkipTo(")", ignore=pp.quotedString) + ")"

其運行測試爲：

command(grep -o '(' file.txt) 
['command(', "grep -o '(' file.txt", ')'] 

command(ls -1) 
['command(', 'ls -1', ')'] 

command(grep -o ')' file.txt) 
['command(', "grep -o ')' file.txt", ')']

請注意，我們還必須明確地告訴SkipTo步驟在任何「）」字，可能是帶引號的字符串內。另外，我們的命令參數的主體現在作爲單個字符串返回。

如果您的命令主體本身可能包含括號內的值，那麼我們仍然會對它們進行查詢。看看這個測試：

command(grep -x '|'.join(['(', ')']) file.txt)

runTests再次向我們表明，我們已經被誤導「）」，我們不想與結束：

command(grep -x '|'.join(['(', ')']) file.txt) 
            ^
FAIL: Expected end of text (at char 37), (line:1, col:38)

您可以添加一個超前的在「）」告訴SkipTo只匹配「）」這是正確的字符串結束前：

cmd = "command(" + pp.SkipTo(")" + pp.FollowedBy(pp.StringEnd()), 
          ignore=pp.quotedString) + ")"

但與此解析器，我們實際上已經恢復了，你可以用繩子做的一樣好索引，分割和剝離方法消耗臭氧層物質。

最後一個版本，向您展示使用pyparsing的nestedExpr，這將幫助你在你的參數列表內嵌套的括號的情況：

cmd = "command" + pp.originalTextFor(pp.nestedExpr())

通常情況下，nestedExpr將返回解析內容的嵌套列表字符串列表，但通過用originalTextFor包裝它，我們得到原始值。還要注意，我們刪除了「（」從開「命令（」，因爲nestedExpr將用它來解析其開括號，與這些結果：

command(grep -o '(' file.txt) 
['command', "(grep -o '(' file.txt)"] 

command(ls -1) 
['command', '(ls -1)'] 

command(grep -o ')' file.txt) 
['command', "(grep -o ')' file.txt)"] 

command(grep -x '|'.join(['(', ')']) file.txt) 
['command', "(grep -x '|'.join(['(', ')']) file.txt)"]

最終，該辦法你把和解析器的複雜性你需要的將取決於你的解析器的目標，但是這些例子應該給你一些關於如何從這裏擴展的想法。

來源

2017-09-02 15:35:01 PaulMcG

Pyparsing：解析括號

回答

相關問題