2013-06-11 106 views
2

使用re庫這應該是一項非常簡單的任務。但是,我似乎無法將我的字符串拆分爲分號][在列表中沒有分隔符的多個分隔符處分割

我已經閱讀Splitting a string with multiple delimiters in PythonPython: Split string with multiple delimitersPython: How to get multiple elements inside square brackets

我的字符串:

data = "This is a string spanning over multiple lines. 
     At somepoint there will be square brackets. 

     [like this] 

     And then maybe some more text. 

     [And another text in square brackets]" 

它應該返回:

['This is a string spanning over multiple lines.\nAt somepoint there will be square brackets.','like this', 'And then maybe some more text.', 'And another text in square brackets'] 

簡單例子嘗試:

data2 = 'A new string. [with brackets] another line [and a bracket]' 

我想:

re.split(r'(\[|\])', data2) 
re.split(r'([|])', data2) 

但這些要麼導致其在我的結果列表中的分隔符或錯誤列表乾脆:前

['A new string.', 'with brackets', 'another line', 'and a bracket'] 

作爲一個特殊的要求,所有的換行字符和空格:

['A new string. ', '[', 'with brackets', ']', ' another line ', '[', 'and a bracket', ']', ''] 

結果應該是並且在分隔符應該被移除並且不被包括在列表中。

回答

7
​​
+1

是的,這比我推薦的非捕獲組更簡單。 –

+1

工程很好。就像一個補充:我如何刪除元素結尾/開始處的所有換行符和空格? – cherrun

+0

好的。弄清楚了。在列表中的每個元素上使用'strip()'。再次感謝。 – cherrun

4

正如arshajii指出的那樣,這個特定的正則表達式根本不需要組。

如果確實需要組來表示更復雜的正則表達式,則可以使用非捕獲組來分割而不捕獲分隔符。這對其他情況可能有用,但在這裏語法混亂矯枉過正。

(?:...)

A non-capturing version of regular parentheses. Matches whatever regular expression is inside the parentheses, but the substring matched by the group cannot be retrieved after performing a match or referenced later in the pattern. 

http://docs.python.org/2/library/re.html

所以這裏的不必要的複雜性,但示範的例子是:

re.split(r'(?:\[|\])', data2) 
2

用這個代替(無捕獲組):

re.split(r'\s*\[|]\s*', data) 

或較短:

re.split(r'\s*[][]\s*', data) 
0

Couuld無論是拆分或的findall所有,如:

data2 = 'A new string. [with brackets] another line [and a bracket]' 

採用分體式濾除前/後間隔:

import re 
print filter(None, re.split(r'\s*[\[\]]\s*', data2)) 
# ['A new string.', 'with brackets', 'another line', 'and a bracket'] 

或者可能適應的findall方法:

print re.findall(r'[^\b\[\]]+', data2) 
# ['A new string. ', 'with brackets', ' another line ', 'and a bracket'] # needs a little work on leading/trailing stuff...