嘗試使用pyparsing解析某些文本時出現意外的行爲。我一行一行地解析一些半結構化的文本,其中一行可能是一行記錄分隔符,包含整行'='字符,如下所示:嘗試使用pyparsing解析重複字符時出錯
'========= ========================================='
也有可能在這一點上有一個空白行,所以我必須嘗試這兩個選項。如果我嘗試使用以下定義(假定爲import pyparsing as pp
)解析僅包含空格的行:pp.Word('=', min=10)
,則會得到一個IndexError: string index out of range
錯誤,而不是預期的不匹配的pyparsing異常。定義pp.OneOrMore(pp.Word('='))
有預期的行爲,所以我當然會在我的代碼中使用它。我的理解是這些定義在這種情況下應該是等價的,pyparsing應該返回一個ParseException而不是IndexError。我錯過了什麼嗎?
import unittest
import pyparsing as pp
class Test(unittest.TestCase):
def testSepDetail(self):
verbose = True
pattern1 = pp.Word('=', min=10) # Throws IndexError: string index out of range
pattern2 = pp.OneOrMore(pp.Word('=')) # Works as expected
testPattern = pattern1
testList = [
('=======================','======================='),
(' ',None)]
for test in testList:
text, expected = test
result = self.harness2(testPattern, text, expected, verbose)
def harness2(self,pattern, text, expected, verbose):
'''
'''
if verbose:
print('\n---Test for: {0}'.format(text))
try:
result = pattern.parseString(text)
if verbose:
print('Parse successful.\n', result.dump())
if expected:
self.assertEqual(expected, result[0], "\n\tParse Successful, but data not as expected.\n\tExpected {0}\n\t but got {1}".format(expected, result[0]))
return result
except pp.ParseException as x:
failmsg = "\n---Failed to parse string: {0}\n{1}".format(text,str(x))
print(failmsg)
self.fail(failmsg)
if __name__ == "__main__":
#import sys;sys.argv = ['', 'Test.testName']
unittest.main()
忘了提,這是pyparsing 2.0.7 –