2016-04-05 50 views
2

我遇到了一個非常令人困惑的問題,與Python中的正則表達式匹配。 我有一對在調試工具,如regex101做工精細正則表達式模式:Python正則表達式groupdict返回單個字符而不是字符串組

然而,一旦IM在腳本中表達的情況下,這些模式不能匹配任何東西,除非在開始引用之前編譯並在前面加上r

即使如此,匹配返回組字典中的單個字符。

任何人都可以提供任何指針,我在做什麼錯在這裏?

deobf.py:

#!/bin/python 
import sys 
import getopt 
import re 
import base64 

#################################################################################### 
# 
# Setting up global vars and functions 
# 
#################################################################################### 

# Assemble Pattern Dictionary 
pattern={} 
pattern["HexOct"]=re.compile(r'([\"\'])(?P<obf_code>(\\[xX012]?[\dA-Fa-f]{2})*)\1') 
pattern["Base64"]=re.compile(r'([\"\'])(?P<obf_code>[\dA-Za-z\/\+]{15,}={0,2})\1') 

# Assemble more precise Pattern handling: 
sub_pattern={} 
sub_pattern["HexOct"]=re.compile(r'((?P<Hex>\\[xX][\dA-Fa-f]{2})|(?P<Oct>\\[012]?[\d]{2}))') 

#print pattern # trying to Debug Pattern Dicts 
#print sub_pattern # trying to Debug Pattern Dicts 

# Global Var init 
file_in="" 
file_out="" 
code_string="" 
format_code = False 

# Prints the Help screen 
def usage(): 
    print "How to use deobf.py:" 
    print "-----------------------------------------------------------\n" 
    print "$ python deobf.py -i {inputfile.php} [-o {outputfile.txt}]\n" 
    print "Other options include:" 
    print "-----------------------------------------------------------" 
    print "-f : Format - Format the output code with indentations" 
    print "-h : Help - Prints this info\n" 
    print "-----------------------------------------------------------" 
    print "You can also use the long forms:" 
    print "-i : --in" 
    print "-o : --out" 
    print "-f : --format" 
    print "-h : --Help" 

# Combination wrapper for the above two functions 
def deHexOct(obf_code): 
    match = re.search(sub_pattern["HexOct"],obf_code) 
    if match: 

     # Find and process Hex obfuscated elements 
     for HexObj in match.groupdict()["Hex"]: 
      print match.groupdict()["Hex"] 
      print "Processing:" 
      print HexObj.pattern 
      obf_code.replace(HexObj,chr(int(HexObj),16)) 

     # Find and process Oct obfuscated elements 
     for OctObj in set(match.groupdict()["Oct"]): 
      print "Processing:" 
      print OctObj 
      obf_code.replace(OctObj,chr(int(OctObj),8)) 
    return obf_code 

# Crunch the Data 
def deObfuscate(file_string): 
    # Identify HexOct sections and process 
    match = re.search(pattern["HexOct"],file_string) 
    if match: 
     print "HexOct Obfuscation found." 
     for HexOctObj in match.groupdict()["obf_code"]: 
      print "Processing:" 
      print HexOctObj 
      file_string.replace(HexOctObj,deHexOct(HexOctObj)) 

    # Identify B64 sections and process 
    match = re.search(pattern["Base64"],file_string) 
    if match: 
     print "Base64 Obfuscation found." 
     for B64Obj in match.groupdict()["obf_code"]: 
      print "Processing:" 
      print B64Obj 
      file_string.replace(B64Obj,base64.b64decode(B64Obj)) 

    # Return the (hopefully) deobfuscated string 
    return file_string 

# File to String 
def loadFile(file_path): 
    try: 
     file_data = open(file_path) 
     file_string = file_data.read() 
     file_data.close() 
     return file_string 
    except ValueError,TypeError: 
     print "[ERROR] Problem loading the File: " + file_path 

# String to File 
def saveFile(file_path,file_string): 
    try: 
     file_data = open(file_path,'w') 
     file_data.write(file_string) 
     file_data.close() 
    except ValueError,TypeError: 
     print "[ERROR] Problem saving the File: " + file_path 

#################################################################################### 
# 
# Main body of Script 
# 
#################################################################################### 
# Getting the args 
try: 
    opts, args = getopt.getopt(sys.argv[1:], "hi:o:f", ["help","in","out","format"]) 
except getopt.GetoptError: 
    usage() 
    sys.exit(2) 

# Handling the args 
for opt, arg in opts: 
    if opt in ("-h", "--help"): 
     usage() 
     sys.exit() 
    elif opt in ("-i", "--in"): 
     file_in = arg 
     print "Designated input file: "+file_in 
    elif opt in ("-o", "--out"): 
     file_out = arg 
     print "Designated output file: "+file_out 
    elif opt in ("-f", "--format"): 
     format_code = True 
     print "Code Formatting mode enabled" 

# Checking the input 
if file_in =="": 
    print "[ERROR] - No Input File Specified" 
    usage() 
    sys.exit(2) 

# Checking or assigning the output 
if file_out == "": 
    file_out = file_in+"-deObfuscated.txt" 
    print "[INFO] - No Output File Specified - Automatically assigned: "+file_out 

# Zhu Li, Do the Thing! 
code_string=loadFile(file_in) 
deObf_String=deObfuscate(str(code_string)) 
saveFile(file_out,deObf_String) 
從我的調試輸出

控制檯輸出如下:

C:\Users\NJB\workspace\python\deObf>deobf.py -i "Form 5138.php" 
Designated input file: Form 5138.php 
[INFO] - No Output File Specified - Automatically assigned: Form 5138.php-deObfuscated.txt 
HexOct Obfuscation found. 
Processing: 
\ 
Processing: 
x 
Processing: 
6 
Processing: 
1 
Processing: 
\ 
Processing: 
1 
Processing: 
5 
Processing: 
6 
Processing: 
\ 
Processing: 
x 
Processing: 
7 
Processing: 
5 
Processing: 
\ 
Processing: 
1 
Processing: 
5 
Processing: 
6 
Processing: 
\ 
Processing: 
x 
Processing: 
6 
Processing: 
1 

回答

1

你的正則表達式匹配的是組就好了,但你然後通過迭代字符在匹配組中。

這給你只是匹配的字符串:你想,而不是迭代搜索,所以使用re.finditer()代替re.search()

for HexObj in match.groupdict()["Hex"]: 

match.groupdict()["Hex"]

這迭代字符串中的字符。因此,像:

def deHexOct(obf_code): 
    for match in re.finditer(sub_pattern["HexOct"],obf_code): 
     # Find and process Hex obfuscated elements 
     groups = match.groupdict() 
     hex = groups["Hex"] 
     if hex: 
      print "hex:", hex 
      # do processing here 
     oct = groups["Oct"] 
     if oct: 
      print "oct:", oct 
      # do processing here 

此外,r在前面的字符串只是停止的Python解釋反斜線脫離,而且需要對正則表達式,因爲他們也用反斜槓逃逸的。另一種方法是將正則表達式中的每個反斜槓加倍;那麼你不需要r前綴,但正則表達式可能變得更不可讀。

+0

感謝您指出字符問題,不幸的是,我得到一個findall錯誤,指出返回的匹配對象沒有方法groupdict()。有趣的是,打印它顯示它是沒有鑰匙的字典。我用finditer()多了一點點運氣,但還是摸不着頭腦。 – Minothor

+1

對不起,應該說'finditer'。 'finditer'返回匹配對象,'findall'只返回字符串。 – Duncan

+0

隨着這一變化,我有功能!現在我只需要將我的工作減少到DRYest可能的解決方案。 – Minothor

相關問題