解析文本文檔的正則表達式

-1

我試圖用！if和！endif來解析文本文檔。我希望文本沒有！if，！endif和它們之間的文本。解析文本文檔的正則表達式

例如：

text 
!if 
text1 
!endif 
text2

我想有我輸出=文字+文本2 + ..

我想是這樣的re.findall（R'（（^（如果*！！ENDIF））+」，文本），但它似乎沒有對我的工作

來源

2012-07-27 mousey

我怎麼沒看到你的表情不會提高一個'SyntaxError'，因爲你從來沒有對關閉撇號你的原始文本。 – 2012-07-28 00:15:14

@JoelCornett它只是一個錯字。我糾正了它 – mousey 2012-07-28 23:07:31

你的正則表達式是：。

^!if$.*?^!endif$\s+

這表示：

^  - Match the beginning of a line (because of the re.M flag) 
!if - Match ! 
$  - Match the end of a line (because of the re.M flag) 
.*? - Match any number of characters (non-greedy) (includes line breaks, because of the re.S flag) 
^  - Match the beginning of a line (because of the re.M flag) 
!endif - Match !endif 
$  - Match the end of a line (because of the re.M flag) 
\s+ - Match one or more whitespace characters

所以，你應該能夠有一個空字符串（沒有）這樣使用它，它取代了上述正則表達式的所有出現：

import re 
s = "text\n!if\ntext1\n!endif\ntext2" 
s = re.sub("^!if$.*?^!endif$\s+", "", s, flags=re.S | re.M) 
print s

這will output：

text 
text2

請注意，這明確要求!if和!endif位於不同的行上。如果這不是要求，則可以從正則表達式的中間刪除$和^錨點。

^!if.*?!endif$\s+

來源

2012-07-27 23:22:49 nickb

閱讀OP的來源表明文本實際上在多行上。我相應地編輯了它。 – 2012-07-27 23:57:24

@Karl - 我明白了，感謝您的更新。我糾正了我的答案。 – nickb 2012-07-28 00:09:56

我可以幫忙的sed：

sed '/^if$/,/^endif$/ d'

下面是sed中使用的算法：

設置變量匹配=假
讀下一行
檢查如果該行等於'if'。如果是這樣，設置變量match = True
if match == True，檢查current-line =='endif'。如果是，則設置match = False並刪除當前行[並跳轉到0]。
打印當前行
如果不是EOF，跳到1

來源

2012-07-28 23:25:58 alinsoar

我不太清楚shell腳本對你是否有用。 – alinsoar 2012-07-28 23:28:47

解析文本文檔的正則表達式

回答

相關問題