2013-11-04 49 views
12

在Python 2.7和3,以下工作:Python正則表達式允許的最大重複次數是多少?

>>> re.search(r"a{1,9999}", 'aaa') 
<_sre.SRE_Match object at 0x1f5d100> 

但是這給出了一個錯誤:

>>> re.search(r"a{1,99999}", 'aaa') 
Traceback (most recent call last): 
    File "<stdin>", line 1, in <module> 
    File "/usr/lib/python2.7/re.py", line 142, in search 
    return _compile(pattern, flags).search(string) 
    File "/usr/lib/python2.7/re.py", line 240, in _compile 
    p = sre_compile.compile(pattern, flags) 
    File "/usr/lib/python2.7/sre_compile.py", line 523, in compile 
    groupindex, indexgroup 
RuntimeError: invalid SRE code 

好像有允許的重複次數的上限。這是正則表達式規範的一部分,還是Python特有的限制?如果特定於Python,實際的編號是否在某個地方記錄下來,並且它是否因實現而有所不同?

回答

14

快速手動二進制搜索揭曉答案,特別是65535:

>>> re.search(r"a{1,65535}", 'aaa') 
<_sre.SRE_Match object at 0x2a9a68> 
>>> 
>>> re.search(r"a{1,65536}", 'aaa') 
Traceback (most recent call last): 
    File "<stdin>", line 1, in <module> 
    File "/Library/Frameworks/Python.framework/Versions/7.3/lib/python2.7/re.py", line 142, in search 
    return _compile(pattern, flags).search(string) 
    File "/Library/Frameworks/Python.framework/Versions/7.3/lib/python2.7/re.py", line 240, in _compile 
    p = sre_compile.compile(pattern, flags) 
    File "/Library/Frameworks/Python.framework/Versions/7.3/lib/python2.7/sre_compile.py", line 523, in compile 
    groupindex, indexgroup 
OverflowError: regular expression code size limit exceeded 

這是討論here

The limit is an implementation detail. The pattern is compiled into codes which are then interpreted, and it just happens that the codes are (usually) 16 bits, giving a range of 0..65535, but it uses 65535 to represent no limit and doesn't warn if you actually write 65535.

The quantifiers use 65535 to represent no upper limit, so ".{0,65535}" is equivalent to ".*".


感謝作者的下面指着一些事情了評論:

  • CPython的實現了_sre.c此限制。 (@LukasGraf)
  • 有在sre_constants.py恆定MAXREPEAT持有該最大重複值:

    >>> import sre_constants 
    >>> 
    >>> sre_constants.MAXREPEAT 
    65535 
    

    (@MarkkuK和@hcwhsa。)

+1

如果你想指出這一點在你的答案中:對於CPython,這個限制在['_sre.c']中實現(http://hg.python.org/cpython/file/7268838063e1/Modules/_sre.c#l2694) –

+3

另外,如果你看在sre_constants.py中,你會發現'MAXREPEAT = 65535' –

+1

這個限制可以使用:'import sre_constants;打印sre_constants.MAXREPEAT' –