將純文本與pyparsing進行匹配

我不知道如何解析純文本（還有空格），並且仍然能夠匹配文本中的特殊結構。假設你有一個字符串像將純文本與pyparsing進行匹配

some plain text 
specialStructure 
plain text again

我試圖做到的，是一個解析器，讓我

['some plain text\n', 'specialStructure', '\nplain text again']

我的第一次嘗試是

import pyparsing as pp 

def join_words(toks): 
    return ' '.join(toks) 

struct = pp.Regex(r'specialStructure') 
word = ~struct + pp.Word(pp.alphas) 
txt = pp.OneOrMore(word).addParseAction(join_words) 
grammar = pp.ZeroOrMore(struct | txt) 

result = grammar.parseString(s)

即使這給我想在這種情況下，這裏的問題是，如果純文本有一些換行符或製表符或其他類型的空格，最後我只能得到空格鍵分隔的單詞...

如何直接匹配純文本直到找到特殊結構或輸入結束？

更新

的部分解決方案，我發現是使用SkipTo類：

import pyparsing as pp 

struct = pp.Regex(r'specialStructure') 
txt = pp.SkipTo(struct) | pp.SkipTo(pp.StringEnd(), include=True) 
grammar = pp.ZeroOrMore(struct | txt) 

result = grammar.parseString(s)

這裏的問題是嵌套結構。假設你有一個更復雜的字符串進行解析，如：

s = """ 
some plain text 
nestedStructureBegin 
    here we are inside a nested structure 
    nestedStructureBegin 
     bla bla 
    nestedStructureEnd 
nestedStructureEnd 
some bla bla again. 
""" 

import pyparsing as pp 

grammar = pp.Forward() 
begin = pp.Regex(r'nestedStructureBegin').suppress() 
end = pp.Regex(r'nestedStructureEnd').suppress() 
struct = begin + pp.Group(grammar) + end 
keyword = begin | end 
txt = pp.SkipTo(keyword) | pp.SkipTo(pp.StringEnd(), include=True) 
grammar << pp.ZeroOrMore(struct | txt) 

for parser in [struct, txt]: 
    parser.addParseAction(lambda toks: print(toks)) 

result = grammar.parseString(s)

我認爲這個問題來自於使用pp.StringEnd的不嵌套結構內匹配，但我不知道這有什麼錯這...任何建議？

來源

2017-08-09 bluePhlavio

我還是不明白你的要求。你想得到一個清單，究竟是什麼？ – maestromusica

我想要一個pyparsing.ParseResults對象，該對象向我提供當您調用asList（）方法時寫入的列表。 – bluePhlavio

使用'scanString'或'searchString'進行研究，這將允許您解析您的特殊結構並跳過其餘部分。使用'scanString'，您還將獲得解析的開始和結束位置，因此您可以使用字符串切片來拉出前後的部分。 – PaulMcG

我發現了一個即使嵌套結構也能正常工作的解決方案。這個想法是通過char解析輸入字符，然後使用pp.Combine來重建原始純文本輸入。

s = """ 
some plain text 
begin 
    we are inside a nested structure 
    begin 
     some more depth 
    end 
end 
and finally some more bla bla... 
""" 

import pyparsing as pp 

grammar = pp.Forward() 
begin = pp.Regex(r'begin').suppress() 
end = pp.Regex(r'end').suppress() 
keyword = begin | end 
block = begin + pp.Group(grammar) + end 
char = ~keyword + pp.Regex(r'[\s\S]') 
chars = pp.OneOrMore(char) 
txt = pp.Combine(chars) 
grammar << pp.ZeroOrMore(block | txt) 

result = grammar.parseString(s)

來源

2017-08-11 09:49:41 bluePhlavio

將純文本與pyparsing進行匹配

回答

相關問題