一個多我有以下數據:Python的正則表達式匹配與回車
POST/HTTP/1.1
User-Agent: curl/7.27.0
Host: 127.0.0.1
Accept: */*
Content-Length: 55
Content-Type: application/x-www-form-urlencoded
id=1234&var=test&nextvar=hh%20hg&anothervar=BB55SSKKKkk
或
POST/HTTP/1.1\r\n
User-Agent: curl/7.27.0\r\n
Host: 127.0.0.1\r\n
Accept: */*\r\n
Content-Length: 55\r\n
Content-Type: application/x-www-form-urlencoded\r\n
\r\n
id=1234&var=test&nextvar=hh%20hg&anothervar=BB55SSKKKkk\r\n
或
POST/HTTP/1.1^M
User-Agent: curl/7.27.0^M
Host: 127.0.0.1^M
Accept: */*^M
Content-Length: 55^M
Content-Type: application/x-www-form-urlencoded^M
^M
id=1234&var=test&nextvar=hh%20hg&anothervar=BB55SSKKKkk^M
我怎麼能只id=1234&var=test&nextvar=hh%20hg&anothervar=BB55SSKKKkk
匹配字符串?我的意思是什麼two end of lines
(\r\n or ^M
)和下一end of line
(\r\n or ^M
) 之間可印刷我想是這樣的:
re.findall(r'^>([^\r\n]+)[\r\n]([a-zA-Z0-9=%&\r\n]+)', buf, re.MULTILINE|re.DOTALL)
,但不匹配。我究竟做錯了什麼?
嗯,這讓我像'['\ r \ n \ r \ NID = 1234&VAR =測試&nextvar = HH%20hg&anothervar = BB55SSKKKkk \ X00 \ X00 \ X00 \ X00 \ X00 \ X00 \ X00 \ X00 \ X00 \ X00 \ X00 \ X00 \ X00 \ X00 \ X00 \ X00 \ X00 \ X00 \ X00 \ X00 \ X00 \ X00 \ X00 \ X00 \ X00 \ X00 \ X00 \ X00 \ X00 \ X00 \ X00 \ X00 \ X00 \ X00 \ X00 \ X00 \ X00 \ X00 \ X00 \ X00 \ X00 \ X00 \ X00 \ X00 \ X00 \ X00 \ X00 \ X00 \ X00 \ X00 \ X00 \ X00 \ X00 \ X00 \ X00 \ X00 \ X00 \ X00 \ X00 \ X00 \ X00 \ X00 \ X00 \ X00 \ X00 \ X00 \ X00 \ X00 \ x00 ...'](剪切)爲什麼? – bsteo
@xtmtrx呵呵,你的文件中有空字符。順便說一句'\ x00'是一個空字符。我不完全熟悉在正則表達式中使用unicode字符,但你可以試試:'(?<![\ r \ n])(?:\ r \ n | \ r | \ n){2} [^ \ r \ n \ x00] +'(我在末尾加了'\ x00')。 – Jerry
用'found_rx + = re.findall(r'(?<![\ r \ n])(?:\ r \ n){2} [^ \ r \ n \ x00] +',buf,re。 MULTILINE)''我得到了:'['\ r \ n \ r \ nid = 1234&var = test&nextvar = hh%20hg&anothervar = BB55SSKKKkk']',我已經移除了| | r | \ n'爲什麼我仍然得到前兩個'\ r \ n'在我的字符串中? – bsteo