HTML分析正則表達式

我想解析一個HTML文檔並獲取所有用戶的暱稱。HTML分析正則表達式

他們都是這種格式：

<a href="/nickname_u_2412477356587950963">Nickname</a>

我能如何使用PHP中的regular expression呢？我無法使用DOMElement或簡單的HTML解析。

來源

2011-08-14 André Cardoso

Oblig：http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 – Oded

出於純粹的好奇心，爲什麼你不能使用HTML解析器？ –

你不需要一個正則表達式，你可以用[DomDocument :: loadHTML（）]（http://www.php.net/manual/en/domdocument.loadhtml.php）來完成。 – arnaud576875

這裏是不使用正則表達式的工作液：

的DomDocument :: loadHTML（）是忘記足以對畸形HTML工作。

<?php 
    $doc = new DomDocument; 
    $doc->loadHTML('<a href="/nickname_u_2412477356587950963">Nickname</a>'); 

    $xpath = new DomXPath($doc); 
    $nodes = $xpath->query('//a[starts-with(@href, "/nickname")]'); 

    foreach($nodes as $node) { 
     $username = $node->textContent; 
     $href = $node->getAttribute('href'); 
     printf("%s => %s\n", $username, $href); 
    }

來源

2011-08-14 16:56:35 arnaud576875

謝謝，但你們不明白我的意思，「暱稱」是一個變量，所以這個隨機數字集 –

preg_match_all(
    '{     # match when 
     nickname_u_  # there is nickname_u 
     [\d+]*   # followed by any number of digits 
     ">    # followed by quote and closing bracket 
     (.*)?   # capture anything that follows 
     </a>   # until the first </a> sequence 
    }xm', 
    '<a href="/nickname_u_2412477356587950963">Nickname</a>', 
    $matches 
); 
print_r($matches);

了一個多HTML parser對HTML使用正則表達式平時免責聲明適用。以上可能可以改進到更可靠的匹配。 It will work for the example you gave though.

來源

2011-08-14 17:02:19 Gordon

HTML分析正則表達式

回答

相關問題