2010-10-15 132 views
1

我有以下字符串:正則表達式在Python

schema(field1, field2, field3, field4 ... fieldn) 

我需要將字符串name屬性爲schema和字段名的另一個屬性是一個列表轉換爲一個對象。

如何在Python中使用正則表達式執行此操作?

回答

1

對於這樣的事情可能需要的正則表達式測試:

import unittest 

import re 

# Verbose regular expression! http://docs.python.org/library/re.html#re.X 
p = r""" 

(?P<name>[^(]+)   # Match the pre-open-paren name. 
\(      # Open paren 
(?P<fields>    # Comma-separated fields 
    (?: 
     [a-zA-Z0-9_-]+ 
     (?:,\)   # Subsequent fields must separated by space and comma 
    )* 
    [a-zA-Z0-9_-]+  # At least one field. No trailing comma or space allowed. 
) 

\)      # Close-paren 
""" 

# Compiled for speed! 
cp = re.compile(p, re.VERBOSE) 

class Foo(object): 
    pass 


def validateAndBuild(s): 
    """Validate a string and return a built object. 
    """ 
    match = cp.search(s) 
    if match is None: 
     raise ValueError('Bad schema: %s' % s) 

    schema = match.groupdict() 
    foo = Foo() 
    foo.name = schema['name'] 
    foo.fields = schema['fields'].split(', ') 

    return foo 



class ValidationTest(unittest.TestCase): 
    def testValidString(self): 
     s = "schema(field1, field2, field3, field4, fieldn)" 

     obj = validateAndBuild(s) 

     self.assertEqual(obj.name, 'schema') 

     self.assertEqual(obj.fields, ['field1', 'field2', 'field3', 'field4', 'fieldn']) 

    invalid = [ 
     'schema field1 field2', 
     'schema(field1', 
     'schema(field1 field2)', 
     ] 

    def testInvalidString(self): 
     for s in self.invalid: 
      self.assertRaises(ValueError, validateAndBuild, s) 


if __name__ == '__main__': 
    unittest.main() 
+1

與我的回答有什麼不同?除了擁有所有冗餘測試代碼和醜陋的正則表達式? – SilentGhost 2010-10-15 15:05:49

+0

@David,如何更改正則表達式以使字段之間的空間可選? – 2010-10-15 15:14:25

+0

在第13行,將'\)'改爲'\?)'。這使得可以選擇逃逸空間。 (請參閱中的「量詞」一節。 – 2010-10-15 15:20:23

5

你在找這樣的嗎?

>>> s = 'schema(field1, field2, field3, field4, field5)' 
>>> name, _, fields = s[:-1].partition('(') 
>>> fields = fields.split(', ') 
>>> if not all(re.match(r'[a-z]+\d+$', i) for i in fields): 
    print('bad input') 

>>> sch = type(name, (object,), {'attr': fields}) 
>>> sch 
<class '__main__.schema'> 
>>> sch.attr 
['field1', 'field2', 'field3', 'field4', 'field5'] 
+1

謝謝,但我正在尋找一個解決方案,在這個過程中,還允許我驗證字符串是在上面指定的格式。 – 2010-10-15 14:21:15

+0

想知道,你有沒有使用'partition()'而不是'split(...,1)'的具體原因,還是僅僅是首選?無論哪種方式,+1 :) – Wolph 2010-10-15 14:21:24

+1

@Yasmin:這是? – SilentGhost 2010-10-15 14:21:39

0

您可以使用類似(兩輪,因爲蟒蛇重不支持嵌套捕獲(感謝SilentGhost指點出來)):

pattern = re.compile("^([a-z]+)\(([a-z,]*)\)$") 

ret = pattern.match(s) 

if ret==None: 
    ... 
else: 
    f = ret.groups() 
    name = f[0] 
    args = f[1] 

    arg_pattern = re.compile("^([a-z]+)(,[a-z]+)*$") 

    ret2 = arg_pattern.match(args) 

    # same checking as above 
    if (ret2==None): 
     ... 
    else: 
     args_f = ret2.groups() 
+1

它只能用於兩個參數,Python不支持嵌套捕獲 – SilentGhost 2010-10-15 14:52:31

+0

它對字段> 2有效嗎?我嘗試了四個字段並打印字段打印架構,第一個和最後一個。錯誤? – 2010-10-15 14:56:02

+1

是的(參考SilentGhost)。我試圖解決這個問題...... – ThR37 2010-10-15 14:57:52