PHP Preg匹配捕獲組

我似乎無法在PHP中獲得正則表達式的竅門。具體來說，組捕捉部分。PHP Preg匹配捕獲組

我有一個字符串，它看起來像這樣

<table cellpadding="0" cellspacing="0" border="0" width="100%" class="List"> 

    <tr class='row_type_1'> 
    <td class="time"> 
         3:45 pm 
    </td> 
    <td class="name"> 
         Kira 
    </td> 
    </tr> 

    <tr class='row_type_2'> 
    <td class="time"> 
         4:00 pm 
    </td> 
    <td class="name"> 
         Near 
    </td> 
    </tr> 

</table>

而且我希望我的陣列看起來像這樣

Array 
(
    [0] => Array 
    (
     [0] => 3:45 pm 
     [1] => Kira 
    ) 
    [1] => Array 
    (
     [0] => 4:00 pm 
     [1] => Near 
    ) 
)

我只想使用的preg_match，而不是爆炸，array_keys或循環。花了我一段時間才發現我需要a/s來計算換行符;我非常渴望看到模式和捕捉語法。

編輯：該模式只需要像（row_type_1 | row_type_2）這樣的東西來捕獲我想要數據的表中只有兩種類型的行。例如，在row_type_2之後是row_type_3，後面是row_type_1，那麼row_type_3將被忽略，數組只會添加row_type_1中的數據，如下所示。

Array 
(
    [0] => Array 
    (
     [0] => 3:45 pm 
     [1] => Kira 
    ) 
    [1] => Array 
    (
     [0] => 4:00 pm 
     [1] => Near 
    ) 
    [2] => Array 
    (
     [0] => 5:00 pm 
     [1] => L 
    ) 
)

來源

2013-04-18 Satbir Kira

決不正則表達式處理HTML，使用DOM解析器來代替。 – erenon

你能說明原因嗎？ –

@SatbirKira：因爲你不會說得對。如果對標記稍作修改，你的正則表達式將被打破。使用HTML解析器。 –

我會使用XPath和DOM從HTML中檢索信息。如果HTML或查詢變得更復雜，使用正則表達式可能會變得混亂。（如你目前所見）。而DOM和XPath是這方面的標準。爲什麼不使用它？

想象一下這樣的代碼示例：

// load the HTML into a DOM tree 
$doc = new DOMDocument(); 
$doc->loadHtml($html); 

// create XPath selector 
$selector = new DOMXPath($doc); 

// grab results 
$result = array(); 
// select all tr that class starts with 'row_type_' 
foreach($selector->query('//tr[starts-with(@class, "row_type_")]') as $tr) { 
    $record = array(); 
    // select the value of the inner td nodes 
    foreach($selector->query('td[@class="time"]', $tr) as $td) { 
     $record[0]= trim($td->nodeValue); 
    } 
    foreach($selector->query('td[@class="name"]', $tr) as $td) { 
     $record[1]= trim($td->nodeValue); 
    } 
    $result []= $record; 
} 

var_dump($result);

來源

2013-04-18 19:58:31 hek2mgl

感謝您帶領我朝着正確的方向前進。我將嘗試使用名爲「PHP Simple HTML DOM Parser」的庫。 –

如果你喜歡它，你可以做。這比正則表達式要好得多。 :)我更喜歡DOMXPath作爲它的一個PHP內置，因此它將是1.）開箱即用2.）更快 – hek2mgl

我不能說DOMXPath看起來像我會舒服回去修復，如果我的網站' m刮更改其html。我有我自己的服務器空間的豪華，所以我可以與外部庫。有趣的是，我大學的C++/bash/shell課程的第一個項目之一是使用egrep來取消他們的網站。顯然，我應該知道這是不切實際的，只是爲了舉例。 –

你不應該使用正則表達式的幾個原因解析HTML。最大的原因是很難說明格式不正確的html，並且可能會變得很大而且很慢。

我會建議尋找使用php DOM解析器或php的HTML解析器。

來源

2013-04-18 20:08:58 jgetner

試試這個：

function extractData($str){ 
    preg_match_all("~<tr class='row_type_\d'>\s*<td class=\"time\">(.*)</td>\s*<td class=\"name\">(.*)</td>\s*</tr>~Usim", $str, $match); 
    $dataset = array(); 
    array_shift($match); 
    foreach($match as $rowIndex => $rows){ 
     foreach ($rows as $index => $data) { 
      $dataset[$index][$rowIndex] = trim($data); 
     } 
    } 
    return $dataset; 
} 

$myData = extractData($str);

來源

2013-04-18 20:42:49 Rafael

地獄的道路是在這裏：

$pattern = '`<tr .*?"time">\s++(.+?)\s++</td>.*?"name">\s++(.+?)\s++</td>`s'; 
preg_match_all($pattern, $subject, $matches, PREG_SET_ORDER); 
foreach ($matches as &$match) { 
    array_shift($match); 
} 
?><pre><?php print_r($matches);

來源

2013-04-19 00:56:49

PHP Preg匹配捕獲組

回答

相關問題