2010-04-03 65 views
0

我遇到了grep問題。我應該使用哪種模式與PHP的preg_grep在下面的字符串中提取所有實例「__________」的內容?Grep ...使用PHP的preg_grep提取href屬性等的模式?

1. <h2><a ....>_____</a></h2> 
2. <cite><a href="_____" .... >...</a></cite> 
3. <cite><a .... >________</a></cite> 
4. <span>_________</span> 

的點表示一些任意字符,而下劃線表示我想要的東西。

的示例串是:

 </style></head> 
<body><div id="adBlock"><h2><a href="https://www.google.com/adsense/support/bin/request.py?contact=afs_violation&amp;hl=en" target="_blank">Ads by Google</a></h2> 
<div class="ad"><div><a href="http://www.google.com/aclk?sa=L&amp;ai=C4vfT4Sa3S97SLYO8NN6F-ckB5oq5sAGg6PKlDaT-kwUQASCF4p8UKARQtobS9AVgyZbRhsijoBnIAQGqBBxP0OSEnIsuRIv3ZERDm8GiSKZSnjrVf1kVq-_Y&amp;num=1&amp;sig=AGiWqtwG1qHnwpZ_5BNrjrzzXO5Or6EDMg&amp;q=http://www.crackle.com/c/Spider-Man_The_New_Animated_Series/%3Futm_source%3Dgoogle%26utm_medium%3Dcpc%26utm_campaign%3DGST_10016_CRKL_US_PRD_S_TeleV_SPID_Tele_Spider-Man%26utm_term%3Dspiderman%26utm_content%3Ds264Yjg9f_3472685742_487lrz1638" class="titleLink" target="_parent">Spider-<b>Man</b> Animated Serie</a></div> 
<span>See Your Favorite Spiderman 
<br> 
Episodes for Free. Only on Crackle.</span> 
<cite><a href="http://www.google.com/aclk?sa=L&amp;ai=C4vfT4Sa3S97SLYO8NN6F-ckB5oq5sAGg6PKlDaT-kwUQASCF4p8UKARQtobS9AVgyZbRhsijoBnIAQGqBBxP0OSEnIsuRIv3ZERDm8GiSKZSnjrVf1kVq-_Y&amp;num=1&amp;sig=AGiWqtwG1qHnwpZ_5BNrjrzzXO5Or6EDMg&amp;q=http://www.crackle.com/c/Spider-Man_The_New_Animated_Series/%3Futm_source%3Dgoogle%26utm_medium%3Dcpc%26utm_campaign%3DGST_10016_CRKL_US_PRD_S_TeleV_SPID_Tele_Spider-Man%26utm_term%3Dspiderman%26utm_content%3Ds264Yjg9f_3472685742_487lrz1638" class="domainLink" target="_parent">www.Crackle.com/Spiderman</a></cite></div> <div class="ad"><div><a href="http://www.google.com/aclk?sa=l&amp;ai=CnQFi4Sa3S97SLYO8NN6F-ckB3M7nQtyU2PQEq6bCBRACIIXinxQoBFCm15KB-f____8BYMmW0YbIo6AZoAHiq_X-A8gBAaoEIU_Q9JKLiy1MiwdnHpZoBnmpR1J8pP2jpTwMx2uj2nN4WA&amp;num=2&amp;sig=AGiWqtwDrI5pWBCncdDc80FKt32AJMAQ6A&amp;q=http://www.costumeexpress.com/browse/TV-Movies/_/N-1z141uu/Ntt-batman/results1.aspx%3FREF%3DKNC-CEgoogle" class="titleLink" target="_parent">Kids <b>Batman</b> Costumes</a></div> 

<span>Great Selection of <b>Batman</b> &amp; Batgirl 
<br> 
Costumes For Kids. Ships Same Day!</span> 
<cite><a href="http://www.google.com/aclk?sa=l&amp;ai=CnQFi4Sa3S97SLYO8NN6F-ckB3M7nQtyU2PQEq6bCBRACIIXinxQoBFCm15KB-f____8BYMmW0YbIo6AZoAHiq_X-A8gBAaoEIU_Q9JKLiy1MiwdnHpZoBnmpR1J8pP2jpTwMx2uj2nN4WA&amp;num=2&amp;sig=AGiWqtwDrI5pWBCncdDc80FKt32AJMAQ6A&amp;q=http://www.costumeexpress.com/browse/TV-Movies/_/N-1z141uu/Ntt-batman/results1.aspx%3FREF%3DKNC-CEgoogle" class="domainLink" target="_parent">www.CostumeExpress.com</a></cite></div> <div class="ad"><div><a href="http://www.google.com/aclk?sa=l&amp;ai=CAMYT4Sa3S97SLYO8NN6F-ckB3ZnWmgGdoNLrDaumwgUQAyCF4p8UKARQrqSVxwdgyZbRhsijoBmgAZH77uwDyAEBqgQYT9DU7oqLLEyLB2dHlxZFnQzyeg-yHt88&amp;num=3&amp;sig=AGiWqtzqAphZ9DLDiEFBJlb0Ou_1HyEyyA&amp;q=http://www.OfficialBatmanCostumes.com" class="titleLink" target="_parent"><b>Batman</b> Costume</a></div> 
<span>Official <b>Batman</b> Costumes. 

<br> 
Huge Selection &amp; Same Day Shipping!</span> 
<cite><a href="http://www.google.com/aclk?sa=l&amp;ai=CAMYT4Sa3S97SLYO8NN6F-ckB3ZnWmgGdoNLrDaumwgUQAyCF4p8UKARQrqSVxwdgyZbRhsijoBmgAZH77uwDyAEBqgQYT9DU7oqLLEyLB2dHlxZFnQzyeg-yHt88&amp;num=3&amp;sig=AGiWqtzqAphZ9DLDiEFBJlb0Ou_1HyEyyA&amp;q=http://www.OfficialBatmanCostumes.com" class="domainLink" target="_parent">www.OfficialBatmanCostumes.com</a></cite></div> <div class="ad"><div><a href="http://www.google.com/aclk?sa=l&amp;ai=C767t4Sa3S97SLYO8NN6F-ckBkZfSfoOppaMHq6bCBRAEIIXinxQoBFDX2bw6YMmW0YbIo6AZoAHpprP8A8gBAaoEG0_QhJSMiytMiwdnHpZoF3g0Uj8_Vl2r4TpI_g&amp;num=4&amp;sig=AGiWqtyGO2DnFq_jMhP6ufj8pufT9sWQWA&amp;q=http://www.discountsuperherocostumes.com/batman-costumes.html" class="titleLink" target="_parent">Discount <b>Batman</b> Costumes</a></div> 
<span>Discount adult and kids <b>batman</b> 
<br> 
superhero costumes.</span> 

<cite><a href="http://www.google.com/aclk?sa=l&amp;ai=C767t4Sa3S97SLYO8NN6F-ckBkZfSfoOppaMHq6bCBRAEIIXinxQoBFDX2bw6YMmW0YbIo6AZoAHpprP8A8gBAaoEG0_QhJSMiytMiwdnHpZoF3g0Uj8_Vl2r4TpI_g&amp;num=4&amp;sig=AGiWqtyGO2DnFq_jMhP6ufj8pufT9sWQWA&amp;q=http://www.discountsuperherocostumes.com/batman-costumes.html" class="domainLink" target="_parent">www.discountsuperherocostumes.com</a></cite></div></div></body> 
<script type="text/javascript"> 
     var relay = ""; 
    </script> 
<script type="text/javascript" src="/uds/?file=ads&amp;v=1&amp;packages=searchiframe&amp;nodependencyload=true"></script></html> 

謝謝!

+3

不要... http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 – Joey 2010-04-03 11:50:24

回答

4

首先,你不應該使用正則表達式從HTML字符串提取數據。
相反,您應該使用DOM Parser

在這裏,你可以使用:


例如,你可以加載文檔,並實例化DOMXpath類是這樣的:

$html = <<<HTML 
.... 
.... 
HTML; 

$dom = new DOMDocument(); 
@$dom->loadHTML($html); 

$xpath = new DOMXPath($dom); 

,然後,使用XPath找到的元素你正在找。


例如,在第一種情況下,你可以使用這樣的事情,找到所有<a>標記,是<h2>標籤孩子:

// <h2><a ....>_____</a></h2> 
$tags = $xpath->query('//h2/a'); 
foreach ($tags as $tag) { 
    var_dump($tag->nodeValue); 
} 
echo '<hr />'; 


然後,對於第二和第三種情況您正在搜索<a>標記,是<cite>標籤的孩子 - 當你已經發現了他們,要檢查他們是否有href屬性或不:

// <cite><a href="_____" .... >...</a></cite> 
// <cite><a .... >________</a></cite> 
$tags = $xpath->query('//cite/a'); 
foreach ($tags as $tag) { 
    if ($tag->hasAttribute('href')) { 
     var_dump($tag->getAttribute('href')); 
    } else { 
     var_dump($tag->nodeValue); 
    } 
} 
echo '<hr />'; 


最後,最後一個,你只是想<span>標籤:

// <span>_________</span> 
$tags = $xpath->query('//span'); 
foreach ($tags as $tag) { 
    var_dump($tag->nodeValue); 
} 


並不難 - 而且更容易閱讀的正則表達式,不是嗎? ;-)

+0

你知道嗎?調試if $標記hasAttribute塊?我確定我正在搜索的標籤具有我正在搜索的屬性,但它不會識別它們,所以它會拋棄父標籤之間的所有內容。 IE ...我正在搜索onload的body標籤,並且它忽略了onload屬性並在open body和close body之間打印所有內容。 – EllaJo 2011-05-13 14:43:23