2017-04-12 51 views
0

我正在寫一個函數,它正在接受一些輸出,並根據其內容填充字典中的對象。 對象可以是2個組,並且取決於函數正在進行的文本文檔的哪一部分,在輸出中,我確定了類型1或類型2對象並使用相關數據填充它們。類型1對象通常位於State1文檔部分。 Type2對象 - 在State2 我主要依賴elif語句並處理輸入文本文件的每一行(作爲列表進入函數),以正則表達式查找其內容。然而,代碼變得難以管理 - 我正在將每一行都彙集到所有ifs中。 有沒有辦法讓這段代碼更好?Python - 改進基於正則表達式的輸出分析

def func(list): 

    #defining function related variables 
    state = '' 
    state1_specific_value1 = '' 
    state1_specific_value2 = '' 
    state1_specific_value3 = '' 
    state2_specific_value1 = '' 
    state2_specific_value2 = '' 
    state2_specific_value3 = '' 

    for i in list: 

     if REGEXP_DICTIONARY['state1_regexp'].match(i): 
      # processing state1 section 
      state = 'State1' 
     elif REGEXP_DICTIONARY['state2_regexp'].match(i): 
      # processing state2 section 
      state = 'State2' 
     elif REGEXP_DICTIONARY['interesting_line1_regexp'].match(i): 
      # detecting some special conditions for a jar. Is it twistable? 
      # not dependent on state 
      jar_dict[jar].Twistable = True 

     elif REGEXP_DICTIONARY['type'].match(i): 
      jar_type = clean(i.replace(" blablabla ", "")) # quick clean up jar related string to get jar's name. 
      # 
      # making decisions based on State delivered from previous lines and Type detected 
      # 
      if (state == "State1" and type == "Type1"): 
       debug("We detected State1 and Type 1") 
      elif (state == "State2" and type == "Type2"): 
       debug("We detected State2 and Type 2") 
      else: 
       debug ("inconsistency detected: type is {}, state is {}". format(type, state)) 

     # State 1 Type1 related block 
     elif REGEXP_DICTIONARY['type1_state1_related regexp'].match(i) and state == "State1" 
     #do_something 

     elif ... 
     elif ... 
     elif ... 
     elif ... 

     # 
     # State 2 Type2 related block 
     elif REGEXP_DICTIONARY['type2_state2_related regexp'].match(i) and state == "State2": 
      #do_something 
     elif ... 
     elif ... 
     elif ... 
     elif ... 

回答

0

我想你應該把你的代碼分成小的邏輯塊,每個塊有1個動作。類似的東西:

def _get_object_type(obj): 
    """I'm getting type of one object""" 
    ... 

def _process_type_1(type_1_object): 
    """I'm processing type 1 objects""" 
    ... 

def _process_type_2(type_2_object): 
    """I'm processing type 2 objects""" 
    ... 

def _process_object(obj, obj_type): 
    """I'm processing object by types""" 
    if obj_type == "type_1": 
     __process_type_1(obj) 
    if obj_type == "type_2": 
     __process_type_2(obj) 
    ... 

def populate(raw_input): 
    """I'm populating populated dict from raw_input""" 
    populated = {} 

    for elem in raw_input: 
     elem_type = _get_object_type(elem) 
     processed_elem = _process_object(elem, elem_type) 
     ...  

所以你的代碼會更乾淨,你可以很容易地理解你的代碼的每一小塊:)。

0

蟒蛇re模塊支持使用此語法命名組(?P<name>...)

這意味着你可以像這樣創建的正則表達式:

state1_regexp = r"(?P<state1>some text that matches state1)" 
state2_regexp = r"(?P<state2>some different text for state2)" 

然後,你可以在你的正則表達式粘貼在一起,作爲一個巨大的交替:

all_states = '|'.join([state1_regexp, state2_regexp]) 

現在你有這樣的一個正則表達式:

如果匹配一個包羅萬象的正則表達式,你會得到一個結果,如果任何模式的撞擊:

m = re.search(all_states, text) 

您可以用m.groupdict()方法,它返回一個字典訪問這些其中包含名爲的所有子組及其匹配項。如果指定的子組密鑰的值爲None,則它不匹配。

states = { k:v for k,v in m.groupdict().items() if v is not None} 

這裏有一個演示版本:

import re 
state1 = r'(?P<state1>foo)' 
state2 = r'(?P<state2>bar)' 
all_re = '|'.join([state1, state2]) 
text = "eat your own foo" 
m = re.search(all_re, text) 
states = {k:v for k,v in m.groupdict().items() if v is not None} 
print(states) 

一旦你有一個states字典,你可以確認它只有一個鍵(只有一個狀態在時間一致)。或者不 - 也許兩個州可能一次匹配!

無論如何,您可以通過按鍵進行迭代,並使用屬性名稱或功能查找字典或任何技術,你想調用特定狀態代碼:

def handle_state1(): 
    pass 
def handle_state2(): 
    pass 
dispatch = { 
    'state1' : handle_state1, 
    'state2' : handle_state2, 
} 

for k in states.keys(): 
    dispatch[k]()