2013-10-16 64 views
0

我有以下的html:解析值代碼片段使用PHP

<span class="orig_line"> 
<a class="original" href="http://nucleify.org/">Nucleify <i class="externalLink icon-circle-arrow-right"></i></a> 
&middot; 

by <span class="author">Random Person</span> 
&middot; 
October 1, 2013 
</span> 

我使用簡單的HTML DOM解析器類,這是可在SourceForge上,這裏的示例代碼我使用:

$newoutput = str_get_html($htmlCode); 
$html = new simple_html_dom(); 
$html->load($newoutput); 
foreach($html->find('div#titlebar') as $date){ 
$n['date'] = $date->find('span.orig_line',0)->plaintext); 
print $n['date']; 
} 

正如我剛纔想從剝出裏面的任何進一步的html標籤的跨度(.orig_line)的October 1, 2013日期文本,僅僅只有文字,我不能找到辦法解決它......

PS:我只想堅持SimpleHTMLDom類,沒有phpQuery或DOMParsers。

謝謝。

回答

2

由於「simple_html_dom」在很大程度上基於正則表達式,你可以使用正則表達式來明文匹配日期,像這樣:

require 'simple_html_dom.php'; 

$htmlCode = ' 
<div id="titlebar"> 
<span class="orig_line"> 
<a class="original" href="http://nucleify.org/">Nucleify <i class="externalLink icon-circle-arrow-right"></i></a> 
&middot; 

by <span class="author">Random Person</span> 
&middot; 
October 1, 2013 
</span> 
</div>'; 

$html = new simple_html_dom(); 
$html->load($htmlCode); 

foreach ($html->find('div#titlebar') as $date) 
{ 
    $n = []; 
    $plaintext = $date->find('span.orig_line', 0)->plaintext; 
    preg_match('#[A-Z][a-z]+ \d{1,2}, \d{4}#is', $plaintext, $matches); 
    $n['date'] = $matches[0]; 
    var_dump($n); # array (size=1) 'date' => string 'October 1, 2013' (length=15) 
} 
+0

確定這似乎不錯,這將是更好的使用?所以要消除可能出現的任何其他文本: {code} (?:Jan(?:uary)?| Feb(?:ruary)?| Mar(?:ch)?| Apr(?:il) ?|五月|六月(?:E)|?七月(?:y)|?八月:|九月(UST?)?(?:tember)?|九月|十月(?:奧伯)|?十一月(?: ember)?| Dec(?:ember)?)+ \ d {1,2},\ d {4} {/ code} – Ahsan