如何爲preg_match_all創建一個模式

我試過使用Google搜索，但是我無法找到任何明確的信息。首先，我希望有人可以幫我寫的模式來獲得這些標籤之間的信息：如何爲preg_match_all創建一個模式

<vboxview leftinset="10" rightinset="0" stretchiness="1"> // CONTENT INSIDE HERE </vboxview>

和第二，你能不能也請解釋在每個部分和它做什麼的詳細信息模式，以及如何指定獲取代碼的某個部分。

來源

2011-11-08 Ahoura Ghotbi

使用XML解析器。 Regexex不是專爲解析XML或HTML而設計的。 – Cfreak

好的，但那麼preg_match_all用於什麼？因爲在php.net上他們實際上顯示了他們解析html的例子。 –

@AhouraGhotbi - 是的，這是一個不好的例子，他們應該改變它。正則表達式用於解析包含模式的數據。按定義XML和HTML是非結構化的。您可以使用正則表達式來解析它們，但這不是一個好主意，因爲沒有要求文件按特定方式構建。換句話說，即使有人給你一個符合你的規範的XML文件，你的程序也會崩潰。 – Cfreak

看到我爲我的基於SGML語言和正則表達式的咆哮問題評論...

我們我的答案。

如果你知道不會有問題，在標籤內任何其他HTML/XML元素，那麼這將工作得非常好：

<vboxview\s(?P<vboxviewAttributes>(\\>|[^>])*)>(?P<vboxviewContent>(\\<|[^<])*)</vboxview>

看，這個表達式表示：

<vboxview     # match `<vboxview` literally 
\s+      # match at least one whitespace character 
(?P<vboxviewAttributes> # begin capture (into a group named "vboxViewAttributes") 
    (\\>|[^>])*    # any number of (either `\>` or NOT `>`) 
)       # end capture 
>       # match a `>` character 
(?P<vboxviewContent>  # begin capture (into a group named "vboxViewContent") 
    (\\<|[^<])*    # any number of (either `\<` or NOT `<`) 
)       # end capture 
</vboxview>    # match `</vboxview>` literally

您將需要在源代碼中跳過並輸入>字符作爲\>，或者更好的爲HTML/XML實體

如果要去b嵌套的構造內部，那麼你要麼start running into problems with regex，或者你會已經決定使用另一種不涉及正則表達式的方法 - 任何一種方法都是足夠的！

來源

2011-11-08 20:15:50

非常感謝:)我正在尋找其他方法，但現在這似乎很好 –

正如在評論中提到的那樣，嘗試使用正則表達式從HTML中提取東西通常不是一個好主意。如果您希望切換到更加防彈的方法，請使用DOMDocument API輕鬆提取信息。

<?php 
function get_vboxview($html) { 

    $output = array(); 

    // Create a new DOM object 
    $doc = new DOMDocument; 

    // load a string in as html 
    $doc->loadHTML($html); 

    // create a new Xpath object to query the document with 
    $xpath = new DOMXPath($doc); 

    // an xpath query that looks for a vboxview node anywhere in the DOM 
    // with an attribute named leftinset set to 10, an attribute named rightinset 
    // set to 0 and an attribute named stretchiness set to 1 
    $query = '//vboxview[@leftinset=10 and @rightinset=0 and @stretchiness=1]'; 

    // query the document 
    $matches = $xpath->query($query); 

    // loop through each matching node 
    // and the textContent to the output 
    foreach ($matches as $m) { 
      $output[] = $m->textContent; 
    } 

    return $output; 
} 
?>

更妙的是，如果有保證是唯一一個vboxview在你的輸入（也假定你有HTML的控制），你可以在id屬性添加到vboxview和剪切代碼到一個更短和更廣義功能。

<?php 
function get_node_text($html, $id) { 
    // Create a new DOM object 
    $doc = new DOMDocument; 

    // load a string in as html 
    $doc->loadHTML($html); 

    // return the textContent of the node with the id $id 
    return $doc->getElementById($id)->textContent; 
} 
?>

來源

2011-11-09 08:38:02

如何爲preg_match_all創建一個模式

回答

相關問題