2013-12-08 72 views
0

請注意,我使用這裏的.NET正則表達式引擎正則表達式.NET試圖捕捉組前瞻,即重複

這裏是解析字符串:

<div class="c411Listing" onmouseover="ResidentialListings.enhanceListing(this, 1);" onmouseout="ResidentialListings.degradeListing(this, 1);"> 

    <div id="Contact1" class="listingDetail"> 

     <span id="ContactName1" class="c411ListedName"><a href="/res/5068300124/P-DESCHESNES/184421926.html" onclick="utagsave();" onmousedown="utag.link({link_name:'person_name', link_attr1:'in_listing'})" title="P DESCHESNES on 85 Red Pine Dr">P DESCHESNES</a></span> 

     <span class="c411Phone" id="ContactPhone1">(506) 830-2224</span> 

     <span class="c411ListingGeo"><span class="adr" id="ContactAddress1">85 Fictive Dr NB</span></span> 


     <a class="c411GetDirections c411NoPrint" id="ContactDirections1" href="/map/mapSearch.html?layers=dir&amp;from=85+Red+Pine+Dr+NB&amp;what=P+Deschesnes&amp;where=Canada" onmousedown="utag.link({link_name:'direction', link_attr1:'in_listing'});" rel="nofollow">Get directions&nbsp;<span>&rarr;</span></a> 


    </div> 
    <div class="c411HoverMarker c411NoPrint" style="display:none;"> 
     <a href="/res/5068300124/P-DESCHESNES/184421926.html" title="P DESCHESNES"><span>&nbsp;</span></a> 
    </div> 
</div> 




<div class="c411Listing" onmouseover="ResidentialListings.enhanceListing(this, 2, 0);" onmouseout="ResidentialListings.degradeListing(this, 2, 0);"> 

    <div id="Contact2" class="listingDetail"> 

     <span id="ContactName2" class="c411ListedName"><a href="/res/4189883202/P-Deschesnes/179906536.html" onclick="utagsave();" onmousedown="utag.link({link_name:'person_name', link_attr1:'in_listing'})" title="P Deschesnes on 6585 Rue des Orchid&eacute;es">P Deschesnes</a></span> 

     <span class="c411Phone" id="ContactPhone2">(418) 987-3202</span> 

     <span class="c411ListingGeo"><span class="adr" id="ContactAddress2">1000 Rue des Fictive QC G1X 3Z5</span></span> 


     <a class="c411GetDirections c411NoPrint" id="ContactDirections2" href="/map/mapSearch.html?layers=dir&amp;from=1000+Rue+des+Orchid%C3%A9esFictive+QC+G1X+3Z5&amp;what=P+Deschesnes&amp;where=Canada" onmousedown="utag.link({link_name:'direction', link_attr1:'in_listing'});" rel="nofollow">Get directions&nbsp;<span>&rarr;</span></a> 


    </div> 
    <div class="c411HoverMarker c411NoPrint" style="display:none;"> 
     <a href="/res/4189883202/P-Deschesnes/179906536.html" title="P Deschesnes"><span>&nbsp;</span></a> 
    </div> 
</div> 




<div class="c411Listing" onmouseover="ResidentialListings.enhanceListing(this, 3, 0);" onmouseout="ResidentialListings.degradeListing(this, 3, 0);"> 

    <div id="Contact3" class="listingDetail"> 

     <span id="ContactName3" class="c411ListedName"><a href="/res/4506702257/P-DESCHESNES/181606171.html" onclick="utagsave();" onmousedown="utag.link({link_name:'person_name', link_attr1:'in_listing'})" title="P DESCHESNES on 1736 Rue Saint-Alexandre">P DESCHESNES</a></span> 

     <span class="c411Phone" id="ContactPhone3">(450) 671-1111</span> 

     <span class="c411ListingGeo"><span class="adr" id="ContactAddress3">1736 Rue Fictive Longueuil QC J1J 1T2</span></span> 


     <a class="c411GetDirections c411NoPrint" id="ContactDirections3" href="/map/mapSearch.html?layers=dir&amp;from=1000+Rue+Saint-Fictive+Longueuil+QC+J1J+1T1&amp;what=P+Deschesnes&amp;where=Canada" onmousedown="utag.link({link_name:'direction', link_attr1:'in_listing'});" rel="nofollow">Get directions&nbsp;<span>&rarr;</span></a> 


    </div> 
    <div class="c411HoverMarker c411NoPrint" style="display:none;"> 
     <a href="/res/4506702257/P-DESCHESNES/181606171.html" title="P DESCHESNES"><span>&nbsp;</span></a> 
    </div> 
</div> 

你可以在這裏看到重複模式。我想爲每個聯繫人(1,2,3)獲得一個匹配內部3個組:聯繫人姓名,電話和地址。

對於這個例子,我應該得到3場比賽,每場比賽都包含姓名,電話和地址,但由於某些原因,我只收到最後一部電話和地址。

這裏我的.NET正則表達式到目前爲止:

(?si)(?(?=.*<div id="Contact[\d{1,2}]").*<span id="ContactName[\d{1,2}]\".*title=.*>(.*)</a>.*id="ContactPhone[\d{1,2}]">(.*)</span>.*id="ContactAddress[\d{1,2}]\">(.*)</span>) 

能否請你告訴我什麼,我做錯了嗎?

回答

0

對於非常簡單的HTML片段,正則表達式可能很有用。對於更廣泛的內容,比如您的示例,像Html Agility Pack這樣的HTML解析器可能是最強大的解決方案。

有理由不嘗試使用正則表達式解析HTML:Using regular expressions to parse HTML: why not?

+0

我明白這一點,我會進一步探討它,但你能否幫我找到上面的解決方案與reg ex和example。它會幫助我理解.net reg ex在這種情況下 – Pilouk

+0

它*可能*,您需要對''*'s:'。*?'使用非貪婪修飾符。 –