2012-07-16 123 views
0

如何將下列字符串與所有'\ x000','\ x001','\ x002'分開? 我嘗試了像下面這樣的正則表達式,但它不起作用!python split string with x000

z = re.compile(r'[\x000\x001\x002\x003\x004\x005]:') 

line = '114.37.114.95 - - [16/Jul/2012:03:22:37 -0700] "GET /query?dest=adjustable_layout&from_url=http%3A%2F%2Fwww.nownews.com%2F&referer=&width=300&height=330&api_version=1 HTTP/1.1" 200 10481 "http://www.nownews.com/" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; Foxy/1; .NET CLR 1.1.4322; .NET CLR 2.0.50727; .NET CLR 3.0.04506.648; .NET CLR 3.5.21022; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729; yie8)"\x000:1342434156.712809 get_cache http://www.nownews.com/\x000:1342434156.717942 Cache Hits agtzfnRhZ3Rvby1lY3IjCxIGTmV3c0FkIhdodHRwOi8vd3d3Lm5vd25ld3MuY29tLww\x000:1342434156.731564 new version\x001:1342434156.732352 display:[(u\'1\', u\'114\', u\'29\'), (u\'1\', u\'114\', u\'29\'), (u\'1\', u\'114\', u\'13\'), (u\'1\', u\'114\', u\'21\'), (u\'1\', u\'114\', u\'24\'), (u\'1\', u\'114\', u\'13\'), (u\'1\', u\'114\', u\'29\'), (u\'1\', u\'114\', u\'29\'), (u\'1\', u\'114\', u\'21\'), (u\'1\', u\'114\', u\'13\'), (u\'1\', u\'114\', u\'29\'), (u\'1\', u\'114\', u\'24\'), (u\'1\', u\'114\', u\'13\'), (u\'1\', u\'114\', u\'21\'), (u\'1\', u\'114\', u\'29\'), (u\'1\', u\'114\', u\'13\'), (u\'1\', u\'114\', u\'21\'), (u\'1\', u\'114\', u\'24\'), (u\'1\', u\'114\', u\'26\'), (u\'1\', u\'114\', u\'13\')]' 
z.split(line) 

EDIT1

有\ X000 \ X001 \ X002 ....在字符串。我想用這些字符分割字符串。

預期輸出應爲:

['114.37.114.95 - - [16/Jul/2012:03:22:37 -0700] "GET /query?dest=adjustable_layout&from_url=http%3A%2F%2Fwww.nownews.com%2F&referer=&width=300&height=330&api_version=1 HTTP/1.1" 200 10481 "http://www.nownews.com/" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; Foxy/1; .NET CLR 1.1.4322; .NET CLR 2.0.50727; .NET CLR 3.0.04506.648; .NET CLR 3.5.21022; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729; yie8)"', '\x000:1342434156.712809 get_cache http://www.nownews.com/', '\x000:1342434156.717942 Cache Hits agtzfnRhZ3Rvby1lY3IjCxIGTmV3c0FkIhdodHRwOi8vd3d3Lm5vd25ld3MuY29tLww', '\x000:1342434156.731564 new version', '\x001:1342434156.732352 display:[(u\'1\', u\'114\', u\'29\'), (u\'1\', u\'114\', u\'29\'), (u\'1\', u\'114\', u\'13\'), (u\'1\', u\'114\', u\'21\'), (u\'1\', u\'114\', u\'24\'), (u\'1\', u\'114\', u\'13\'), (u\'1\', u\'114\', u\'29\'), (u\'1\', u\'114\', u\'29\'), (u\'1\', u\'114\', u\'21\'), (u\'1\', u\'114\', u\'13\'), (u\'1\', u\'114\', u\'29\'), (u\'1\', u\'114\', u\'24\'), (u\'1\', u\'114\', u\'13\'), (u\'1\', u\'114\', u\'21\'), (u\'1\', u\'114\', u\'29\'), (u\'1\', u\'114\', u\'13\'), (u\'1\', u\'114\', u\'21\'), (u\'1\', u\'114\', u\'24\'), (u\'1\', u\'114\', u\'26\'), (u\'1\', u\'114\', u\'13\')]'] 
+2

這是相當不清楚你想達到的目標。你知道'「\ x000」'是一個由NUL字符和數字'0'組成的雙字符字符串嗎?也許你想''\ 000「',或者等同於'\ x00」'而不是? – 2012-07-16 10:50:14

+0

你也可以給出預期的輸出? – Kamal 2012-07-16 10:51:38

回答

3

\x000是一個兩字節的字符串,由\x00(十六進制0x00)和0(十六進制0x30)。

因此,你不能在像這樣的字符類中使用它。但是

​​

的作品。通過將正則表達式包含在括號中,分隔符也將成爲結果列表的一部分,儘管不直接連接到他們已經分離的字符串部分(如在編輯的問題中)。

如果您確實希望將分隔符保留爲結果字符串的一部分,則不能使用.split()。相反,使用.findall()

>>> z = re.compile(r'(?:\x00[0-5]:)?(?:(?!\x00[0-5]:).)*', re.S) 
>>> z.findall(line) 

說明:

(?:\x00[0-5]:)? # Match an optional leading \x000:, \x001: etc. 
(?:    # Match... 
(?!\x00[0-5]:) # as long as we're not at the start of another \x00n: 
.    # any character (including newlines: re.S) 
)*    # any number of times. 
+0

@SvenMarnach:正則表達式在'\ x00'上分割,後跟一個數字0-5,後跟一個冒號。 – 2012-07-16 10:55:58