在PHP中解析HTML並提取值

有看起來像一個部分：

<th>Some text here</th><td>text to extract</td>

我想找到（與正則表達式或其他解決方案）開始some text here的一部分，並提取了text to extract。

我試圖用正則表達式如下解決方案：

$reg = '/<th>Some text here<\/th><td>(.*)<\/td>/'; 
preg_match_all($reg, $content, $result, PREG_PATTERN_ORDER); 

print_r($result);

，但它給了我只是空數組：

Array ([0] => Array () [1] => Array ())

我應該如何構建我的正則表達式來提取所需要的值？或者我可以使用其他解決方案來提取它？

來源

2016-08-05 Gacek

這工作正常......無法重現您的概率... – Bobot

可以確認@Bob0t它工作正常。至少正則表達式是正確的 –

@mmm：這個解釋與現代正則表達式引擎無關（特別是PHP使用的引擎）*，它與計算機科學意義上的「正則表達式」有關。總之，目前的問題不是這個問題的重複，因爲它提到了不同的東西*（如果你嘗試將它應用到PHP，Perl，Ruby，.net等中使用的正則表達式引擎中，解釋就會變得不正確） * –

使用XPath：

$dom = new DOMDocument; 
libxml_use_internal_errors(true); 
$dom->loadHTML($html); 
libxml_clear_errors(); 

$xp = new DOMXPath($dom); 

$content = $xp->evaluate('string(//th[.="Some text here"]/following-sibling::*[1][name()="td"])'); 

echo $content;

XPath查詢詳情：

string( # return a string instead of a node list 
    // # anywhere in the DOM tree 
    th # a th node 
    [.="Some text here"] # predicate: its content is "Some text here" 
    /following-sibling::*[1] # first following sibling 
    [name()="td"] # predicate: must be a td node 
)

你的模式不起作用的原因可能是因爲TD內容包含換行符字符（與點不匹配）。。

來源

2016-08-05 17:55:13

偉大的解決方案，謝謝！ – Gacek

你可以使用一個DOMDocument。

[email protected]::loadHTML($content); 
$extractedText=NULL; 
foreach($domd->getElementsByTagName("th") as $ele){ 
    if($ele->textContent!=='Some text here'){continue;} 
    $extractedText=$ele->nextSibling->textContent; 
    break; 
} 
if($extractedText===NULL){ 
//extraction failed 
} else { 
//extracted text is in $extractedText 
}

（正則表達式通常是用於解析HTML壞工具，有人在評論中已經指出的那樣）

來源

2016-08-05 17:10:39 hanshenrik

在PHP中解析HTML並提取值

回答

相關問題