Python：如何在多個文件中搜索多個模式

我想在多個文件中「grep」多個正則表達式。我把所有的正則表達式中的一個文件（每行一個），我在下面的方式加載，構建一個「超級正則表達式」：Python：如何在多個文件中搜索多個模式

dic = open('regex.dic') 
rex = [] 
for l in iter(dic): 
    if not l.startswith('#'): 
     rex.append('^.*%s.*$' % l.strip()) 
rex = '|'.join(rex) 
debug('rex='+rex) 
global regex 
regex = re.compile(rex, re.IGNORECASE|re.MULTILINE) 
dic.close()

然後我檢查我的文件是這樣的：

with open(fn, 'r') as f: data = f.readlines() 
for i, line in enumerate(data): 
    if len(line) <= 512: #Sanity check 
     if regex.search(line): 
      if not alreadyFound: 
       log("[!]Found in %s:" % fn) 
       alreadyFound = True 
       found = True 
       copyFile(fn) 
      log("\t%s" % '\t'.join(data[i-args.context:i+args.context+1]).strip())

這可行。我覺得這真的沒有效率和危險（dic中的一些正則表達式可能會打破「超正則表達式」）。我在考慮循環在正則表達式陣列，但這將意味着多次掃描每個文件：/

關於如何做到這一點的任何明智的想法？謝謝！

來源

2013-05-07 Choumarin

我其實並沒有真正看到這個問題。不一定非常優雅，但正如你所說，它可以相對有效地完成工作。 – jdotjdot 2013-05-07 15:10:29

if l and l[0] != '#': 
    try: 
     re.compile(s) 
    except: 
     #handle any way you want 
    else: 
     rex.append('^.*({0}).*$'.format(l.strip()))

這將照顧畸形的正則表達式。

來源

2013-05-07 15:25:22 Elazar

Python：如何在多個文件中搜索多個模式

回答

相關問題