2010-04-23 59 views
2

我有幾千行的文本文件。我想解析這個文件到數據庫中,並決定編寫一個正則表達式。以下是文件的一部分:或正則表達式?

blablabla checked=12 unchecked=1 
blablabla unchecked=13 
blablabla checked=14 

結果,我想獲得像

(12,1) 
(0,13) 
(14,0) 

這可能嗎?

回答

1
import re 

s = """blablabla checked=12 unchecked=1 
blablabla unchecked=13 
blablabla checked=14""" 

regex = re.compile(r"blablabla (?:(?:checked=)(\d+))? ?(?:(?:unchecked=)(\d+))?") 

for line in s.splitlines(): 
    print regex.match(line).groups() 

這給你的字符串(或None如果沒有找到),但這個想法應該是清楚的。

6

使用兩個不同的正則表達式來拉出兩個數字是最簡單的:r" checked=(\d+)"r" unchecked=(\d+)"

1
import re 

lines = ["blablabla checked=12 unchecked=1", "blablabla unchecked=13"] 

p1 = re.compile('checked=(\d)+\sunchecked=(\d)') 
p2 = re.compile('checked=(\d)') 
p3 = re.compile('unchecked=(\d)') 
for line in lines: 
    m = p1.search(line) 
    if m: 
     print m.group(1), m.group(2) 
    else: 
     m = p2.search(line) 
     if m: 
      print m.group(1), "0" 
     else: 
      m = p2.search(line) 
      if m: 
       print "0", m.group(1) 
1

的另一種方法:

import sys 
import re 

r = re.compile(r"((?:un)?checked)=(\d+)") 

for line in open(sys.argv[1]): 
    d = dict(r.findall(line)) 
    print d 

輸出:

{'checked': '12', 'unchecked': '1'} 
{'unchecked': '13'} 
{'checked': '14'} 
0

這是比較通用的,可重用的,我相信:

import re 

def tuple_producer(input_lines, attributes): 
    """Extract specific attributes from lines 'blabla attribute=value …'""" 
    for line in input_lines: 
     line_attributes= {} 
     for match in re.finditer("(\w+)=(\d+)", line): 
      line_attributes[match.group(1)]= int(match.group(2)) # int cast 
     yield tuple(
      line_attributes.get(attribute, 0) # int constant 
      for attribute in wanted_attributes) 


>>> lines= """blablabla checked=12 unchecked=1 
blablabla unchecked=13 
blablabla checked=14""".split("\n") 
>>> list(tuple_producer(lines, ("checked", "unchecked"))) 
[(12, 1), (0, 13), (14, 0)] 

# and an irrelevant example 
>>> list(tuple_producer(lines, ("checked", "inexistant"))) 
[(12, 0), (0, 0), (14, 0)] 

注意轉換爲整數。如果不需要,請刪除int強制轉換,並將0 int常量轉換爲"0"