2017-07-13 28 views
1

我是pyparsing的新手。我試圖解析一些文本,但並不真正瞭解pyparsing的行爲。解析text us組合沒有返回任何結果

from pyparsing import * 

number = Word(nums) 
yearRange = Combine(number+"-"+number) 
copyright = Literal("Copyright (C)")+yearRange+Literal("CA. All Rights Reserved.") 
copyrightCombine = Combine(copyright) 
date = Combine(Word(nums)+"/"+Word(nums)+"/"+Word(nums)) 
time = Combine(Word(nums)+":"+Word(nums)+":"+Word(nums)) 
dateTime = Combine(date+time) 
pageNumber = Suppress(Literal("PAGE"))+number 
pageLine = Word(nums)+"Copyright (C) 1986-2014 CA. All Rights Reserved."+Combine(Word(nums)+"/"+Word(nums)+"/"+Word(nums))+Combine(Word(nums)+":"+Word(nums)+":"+Word(nums))+pageNumber 
pageLine2 = number+copyright+dateTime+pageNumber 
pageLine3 = Word(nums)+copyright+Combine(Word(nums)+"/"+Word(nums)+"/"+Word(nums))+Combine(Word(nums)+":"+Word(nums)+":"+Word(nums))+pageNumber 

test = "1 Copyright (C) 1986-2014 CA. All Rights Reserved.            07/05/17 10:58:56  PAGE 1241" 
print(pageLine.searchString(test)) 
print(copyright.searchString(test)) 
print(copyrightCombine.searchString(test)) 
print(pageLine2.searchString(test)) 
print(pageLine3.searchString(test)) 

輸出:

[['1', 'Copyright (C) 1986-2014 CA. All Rights Reserved.', '07/05/17', '10:58:56', '1241']] 
[['Copyright (C)', '1986-2014', 'CA. All Rights Reserved.']] 
[] 
[] 
[['1', 'Copyright (C)', '1986-2014', 'CA. All Rights Reserved.', '07/05/17', '10:58:56', '1241']] 

我想用定義爲pageLine2針對某種原因解析器copyrightCombine沒有返回任何結果解析器。這似乎是當我試圖使用Combine()時,導致解析無法返回匹配的原因。

回答

1

我想通過Combine()的工作方式發生的行爲。它預計令牌之間不會有任何空格,但可以被覆蓋。

根據the documentation

合併 - 聯接所有匹配標記成一個字符串,使用 指定joinString(默認joinString = 「」);預計所有匹配 令牌鄰接,而沒有中間的空格(可以是 通過指定構造相鄰=假覆蓋)

+0

請同時檢查出的在線文檔在https://pythonhosted.org/pyparsing,這包含超過1000行的內聯示例(或者,您也可以使用Python的'help'命令,如'help(Combine)'中所示。 – PaulMcG