2012-11-15 18 views
1

我是PHP新手。我想編寫代碼來查找下面的html代碼中指定的id,即1123。任何人都可以給我一些想法嗎?使用php查找html源碼的類名稱

<span class="miniprofile-container /companies/1123?miniprofile=" 
     data-tracking="NUS_CMPY_FOL-nhre" 
     data-li-getjs="http://s.c.lnkd.licdn.com/scds/concat/common/js?h=dyt8o4nwtaujeutlgncuqe0dn&amp;fc=2"> 
    <strong> 
     <a href="http://www.linkedin.com/nus-trk?trkact=viewCompanyProfile&pk=biz-overview-public&pp=1&poster=&uid=5674666402166894592&ut=NUS_UNIU_FOLLOW_CMPY&r=&f=0&url=http%3A%2F%2Fwww%2Elinkedin%2Ecom%2Fcompany%2F1123%3Ftrk%3DNUS_CMPY_FOL-nhre&urlhash=7qbc"> 
     Bank of America 
     </a> 
    </strong> 
</span> has a new Project Manager 

注意:我不需要span類中的內容。我需要跨度類名稱中的id

我試過如下:

$dom = new DOMDocument('1.0', 'UTF-8'); 
@$dom->loadHTML($html); 
$xmlElements = simplexml_import_dom($dom); 
$id = $xmlElements->xpath("//span [@class='miniprofile-container /companies/$data_id?miniprofile=']"); 

...但我不知道如何進一步進行。

+0

能否請您解釋一下你到目前爲止嘗試過嗎? – Carsten

回答

1

取決於你的需要,你可以做

$matches = array(); 
preg_match('|<span class="miniprofile-container /companies/(\d+)\?miniprofile|', $html, $matches); 
print_r($matches); 

這是一個很平凡的正則表達式,但可以作爲第一個建議。如果你想通過DomDocument或simplexml去,你不能像你在例子中那樣混合使用。 你最喜歡的方式是什麼,我們可以縮小這個範圍。

//編輯:幾乎說了什麼@fireeyedboy,但是這就是我剛纔擺弄起來:

<?php 
$html = <<<EOD 
<html><head></head> 
<body> 
<span class="miniprofile-container /companies/1123?miniprofile=" 
     data-tracking="NUS_CMPY_FOL-nhre" 
     data-li-getjs="http://s.c.lnkd.licdn.com/scds/concat/common/js?h=dyt8o4nwtaujeutlgncuqe0dn&amp;fc=2"> 
    <strong> 
     <a href="#"> 
     Bank of America 
     </a> 
    </strong> 
</span> has a new Project Manager 

</body> 
</html> 
EOD; 

$domDocument = new DOMDocument('1.0', 'UTF-8'); 
$domDocument->recover = TRUE; 
$domDocument->loadHTML($html); 

$xPath = new DOMXPath($domDocument); 
$relevantElements = $xPath->query('//span[contains(@class, "miniprofile-container")]'); 
$foundId = NULL; 
foreach($relevantElements as $match) { 
    $pregMatches = array(); 
    if (preg_match('|/companies/(\d+)\?miniprofile|', $match->getAttribute('class'), $pregMatches)) { 
     if (isset($pregMatches[1])) { 
      $foundId = $pregMatches[1]; 
      break; 
     } 
    }; 
} 

echo $foundId; 

?> 
+0

我更喜歡dom –

+0

我使用了相同的代碼,以下的html代碼採取的id,但它不工作...你能幫我...

1

這應該做你所追求的:

$dom = new DOMDocument('1.0', 'UTF-8'); 
@$dom->loadHTML($html); 
$xpath = new DOMXPath($dom); 

/* 
* the following xpath query will find all class attributes of span elements 
* whose class attribute contain the strings " miniprofile-container " and " /companies/" 
*/ 
$nodes = $xpath->query("//span[contains(concat(' ', @class, ' '), ' miniprofile-container ') and contains(concat(' ', @class, ' '), ' /companies/')]/@class"); 
foreach($nodes as $node) 
{ 
    // extract the number found between "/companies/" and "?miniprofile" in the node's nodeValue 
    preg_match('#/companies/(\d+)\?miniprofile#', $node->nodeValue, $matches); 
    var_dump($matches[ 1 ]); 
}