2009-08-27 275 views
0

我試圖符合這類字符串在Python中緩慢的正則表達式?

{@csm.foo.bar} 

沒有任何匹配這些

{@[email protected]} 
{@csm.foo.bar-42} 

正則表達式我用的是

r"\{@csm.((?:[a-zA-Z0-9_]+\.?)+)\}" 

它得到狗如果字符串慢包含多個匹配。爲什麼?它運行速度非常快,如果我拿走支架匹配,像這樣

r"@csm.((?:[a-zA-Z0-9_]+\.?)+)" 

但這不是我想要的。

任何想法?

下面是示例輸入:

<dockLayout id="popup" y="0" x="0" width="{@csm.screenWidth}" height="{@csm.screenHeight}"> 
    <dataNumber id="selopacity_Volt" name="selopacity_Volt" value="0" /> 
    <dataNumber id="selopacity_Amp" name="selopacity_Amp" value="0" /> 
    <animate trigger="{@m_ds_ML.VIMPBM_BatteryVoltage.valstr}" triggerOn="*" targetNode="selopacity_Volt" targetAttr="value" to="1" dur="0ms" ease="in" /> 
    <animate trigger="{@m_ds_ML.VIMPBM_BatteryVoltage.valstr}" triggerOn="65024" targetNode="selopacity_Volt" targetAttr="value" to="0" dur="0ms" ease="in" /> 
    <animate trigger="{@m_ds_ML.VIMPBM_BatteryCurrent.valstr}" triggerOn="*" targetNode="selopacity_Amp" targetAttr="value" to="1" dur="0ms" ease="in" /> 
    <animate trigger="{@m_ds_ML.VIMPBM_BatteryCurrent.valstr}" triggerOn="65024" targetNode="selopacity_Amp" targetAttr="value" to="0" dur="0ms" ease="in" /> 
    <dockLayout id="item" width="{@csm.screenWidth}" height="{@csm.screenHeight}" depth="-1" clip="false" xmlns="http://www.tat.se/kastor/kml" > 
    <dockLayout id="list_item_title" x="0" width="{@csm.screenWidth}" height="{@[email protected]_y}"> 
     <text id="volt_amp_text" x="0" ellipsize="false" font="{@csm.listUnselFont}" color="{@csm.itemUnselColor}" dockLayout.halign="left" dockLayout.valign="bottom" string="{ItemTitle}" />    
    </dockLayout>  
    <dockLayout id="gear_layout" y="0" x="0" width="{@csm.screenWidth}" height="{@[email protected]_y}"> 
     <image id="battery_image" x="0" dockLayout.halign="left" dockLayout.valign="bottom" opacity="1" src="{@m_MenuModel.Gauges.VoltAmpereMeter.image}"/> 
    </dockLayout> 
    <!--DockLayout for Voltage Value--> 
    <dockLayout id="volt_value" x="0" width="{@[email protected]_x}" height="{@[email protected]_y}"> 
     <text id="volt_value_text" x="0" opacity="{selopacity_Volt*selopacity_Amp}" ellipsize="false" font="{@csm.listUnselFont}" color="{@csm.itemSelColor}" dockLayout.halign="right" dockLayout.valign="bottom" string="{@m_ds_ML.VIMPBM_BatteryVoltage.valstr}" >  
     </text> 
    </dockLayout> 
    <!--DockLayout for Voltage Unit--> 
    <dockLayout id="volt_unit" x="{@[email protected]_x}" width="{@csm.screenWidth}" height="{@cs[email protected]_y}"> 
     <text id="volt_unit_text" x="0" opacity="{selopacity_Volt*selopacity_Amp}" ellipsize="false" font="{@csm.listUnselFont}" color="{@csm.itemSelColor}" dockLayout.halign="left" dockLayout.valign="bottom" string="V" >   
     </text> 
    </dockLayout> 
    <!--DockLayout for Ampere Value--> 
    <dockLayout id="ampere_value" x="0" width="{@[email protected]_x}" height="{@[email protected]_y}"> 
     <text id="ampere_value_text" x="0" opacity="{selopacity_Amp*selopacity_Volt}" ellipsize="false" font="{@csm.listUnselFont}" color="{@csm.itemSelColor}" dockLayout.halign="right" dockLayout.valign="bottom" string="{@m_ds_ML.VIMPBM_BatteryCurrent.valstr}" > 
     </text> 
    </dockLayout> 
    <!--DockLayout for Ampere Unit--> 
    <dockLayout id="ampere_unit" x="{@[email protected]_x}" width="{@csm.screenWidth}" height="{@[email protected]_y}"> 
     <text id="ampere_unit_text" x="0" opacity="{selopacity_Amp*selopacity_Volt}" ellipsize="false" font="{@csm.listUnselFont}" color="{@csm.itemSelColor}" dockLayout.halign="left" dockLayout.valign="bottom" string="A" >   
     </text> 
    </dockLayout> 
    <!--DockLayout for containing Data Not Available text--> 
    <dockLayout id="no_data_textline" x="{@[email protected]_x}" width="{@csm.screenWidth}" height="{@[email protected]_y}"> 
     <text id="no_data_text" x="0" opacity="{1-(selopacity_Amp*selopacity_Volt)}" ellipsize="false" font="{@csm.listSelFont}" color="{@csm.itemSelColor}" dockLayout.halign="left" dockLayout.valign="bottom" string="{text1}" >   
     </text> 
    </dockLayout> 
    <!--<rect id="test_rect1" x="{151-28}" y="0" width="1" height="240" opacity="1" fill="#00ff00" /> 
       <rect id="test_rect1" x="{237-28}" y="0" width="1" height="240" opacity="1" fill="#00ff00" /> 
       <rect id="test_rect1" x="{160-28}" y="0" width="1" height="240" opacity="1" fill="#00ff00" /> 
       <rect id="test_rect1" x="{246-28}" y="0" width="1" height="240" opacity="1" fill="#00ff00" /> 
       <rect id="test_rect8" x="0" y="{161-40}" width="320" height="1" opacity="1" fill="#00ff00" /> 
       <rect id="test_rect1" x="{109-28}" y="0" width="1" height="240" opacity="1" fill="#00ff00" />--> 
    </dockLayout> 
</dockLayout> 
+0

如果您正在嘗試製作自己的模板語言,那麼這可能是錯誤的做法。 – 2009-08-27 19:03:32

+0

嗯,奇怪...... timeit沒有顯示大括號之間有和沒有大括號之間的主要區別。你可以給一個測試用例嗎? – 2009-08-27 20:43:53

+0

我沒有做模板語言,我正在嘗試後處理一些使用這種格式的配置文件。 – 2009-08-28 06:52:32

回答

4

你能提供一個字符串的測試用例,第一個匹配是「狗慢」嗎?順便說一句,雖然我不知道這對性能有什麼影響,但RE中存在不精確 - 它匹配{@csm開始後的任何單個字符,而不僅僅是一個點;也許是更好的表達(可能更快,因爲它沒有任何點「可選」)可能是:

r'\{@csm((?:\.\w+)+)\}' 
+0

更多consise以及 – 2009-08-27 17:53:32

+0

修復它!非常感謝你。 – 2009-08-28 09:12:51

0

我不完全正則表達式的專家,但它可能是由於在比賽結束時的梅開二度。您可能會嘗試匹配r"\{@csm.((?:[a-zA-Z0-9_]+\.?)+)",並且只是手動檢查是否在結尾處出現右大括號。

+0

是的,這確實有很大的不同。 – 2009-08-28 06:45:26

+0

但後來我無法使用re.sub .. – 2009-08-28 07:00:32

0

你可能需要給的什麼是緩慢的一個更好的例子。對於一個相當長的字符串包含的東西,不和不匹配:

x="".join(['{@csm.foo.bar-%d}\n{@csm.foo.%dx.baz}\n' % (a,a) 
      for a in xrange(10000)]) 
mymatch=r"\{@csm.((?:[a-zA-Z0-9_]+\.?)+)\}" 

for y in re.finditer(mymatch,x): 
    print y.group(0) 

工作正常,但如果你有一個足夠長的字符串,你正在尋找它不好,你可以有問題。