2011-11-12 124 views
-7

可能重複:
My regex is not working properly如何避免之間的文本{{}}

假設我有很長的文字。從下面的文字我只需要抽象部分。如何避免{{ }}之間的文字。由於 `

{{ Info extra text}} 
{{Infobox film 
| name   = Papori 
| released  = 1986 
| runtime  = 144 minutes 
| country  = Assam, {{IND}} 
| budget   = [[a]] 
| followed by = free 
}} 
Albert Einstein (/'ælb?rt 'a?nsta?n/; German: ['alb?t 'a?n?ta?n] (listen); 14 March 1879 – 18 April 1955) 
was a German-born theoretical physicist who developed the theory of general relativity, effecting a 
revolution in physics. For this achievement, Einstein is often regarded as the father of modern physics 
and one of the most prolific intellects in human history.` 

OUTPUT:

Albert Einstein (/'ælb?rt 'a?nsta?n/; German: ['alb?t 'a?n?ta?n] (listen); 14 March 1879 – 18 April 1955) 
was a German-born theoretical physicist who developed the theory of general relativity, effecting a 
revolution in physics. For this achievement, Einstein is often regarded as the father of modern physics 
and one of the most prolific intellects in human history. 
+0

如果你真的* *只是詢問如何維基百科的文章得到摘要,請注意,在[DBpedia中(http://dbpedia.org/罰款鄉親頁面/ Albert_Einstein)使維基百科文章以結構化的方式可用(並且還處理wiki標記)。 –

+0

@John Flatness DBpedia是否提供'API'? –

+1

重複的[我的正則表達式工作不正常](http://stackoverflow.com/questions/8029633/my-regex-is-not-working-properly)和[關於正則表達式蟒蛇](http://stackoverflow.com/questions/8028729/rearding-regex-python) – agf

回答

1

我做了什麼:

>>> text 
"{{ Info extra text}}\n{{Infobox film\n| name   = Papori\n| released  = 1986\n| runtime  = 144 minutes\n| country  = Assam, {{IND}}\n| budget   = [[a]]\n| followed by = free\n}}\nAlbert Einstein (/'ælb?rt 'a?nsta?n/; German: ['alb?t 'a?n?ta?n] (listen); 14 March 1879 – 18 April 1955)\n was a German-born theoretical physicist who developed the theory of general relativity, effecting a\n revolution in physics. For this achievement, Einstein is often regarded as the father of modern physics \n and one of the most prolific intellects in human history.`" 
>>> re.sub(r"\{\{[\w\W\n\s]*\}\}", "", text) 
"\nAlbert Einstein (/'ælb?rt 'a?nsta?n/; German: ['alb?t 'a?n?ta?n] (listen); 14 March 1879 – 18 April 1955)\n was a German-born theoretical physicist who developed the theory of general relativity, effecting a\n revolution in physics. For this achievement, Einstein is often regarded as the father of modern physics \n and one of the most prolific intellects in human history.`" 

編輯:Bart的評論是正確的。

可能會考慮這個選擇:

>>> re.sub(r"\{\{[^\}]*\}\}", "", "{{a\n oaheduh}} b {{c}} d") 
' b d' 
+1

匹配第一個'{{',然後消耗所有東西直到最後一個'}}'。這可能適用於OP發佈的(單個)示例,但也會刪除'「{{a}} b {{c}}」'中的''b''。 –

+0

另外,你可以從'[\ w \ W \ n \ s]'中刪除'\ n \ s',這些集合已經被'\ W'匹配。 –

相關問題