php preg_match_all斜線錯誤的日期

我試圖preg_match_all斜槓在它坐在2個標籤之間的日期;然而它返回null。php preg_match_all斜線錯誤的日期

下面是HTML：

> <td width='40%' align='right'class='SmallDimmedText'>Last Login: 11/14/2009</td>

這裏是我的preg_match_all（）代碼

preg_match_all('/<td width=\'40%\' align=\'right\' class=\'SmallDimmedText\'>Last([a-zA-Z0-9\s\.\-\',]*)<\/td>/', $h, $table_content, PREG_PATTERN_ORDER);

其中$ h是上面的HTML。

我在做什麼錯？
由於事先

來源

2009-12-02 phill

它（從快速瀏覽），是因爲你試圖匹配：

Last Login: 11/14/2009

有了這個表達式：

Last([a-zA-Z0-9\s\.\-\',]*)

正則表達式不包含所需的包含在文本字符串中的:和/的字符。更改正則表達式的必需部分：

Last([a-zA-Z0-9\s\.\-\',:/]*)

給人以匹配

它會更好簡單地用一個DOM parser，然後瓶坯的DOM查找的結果正則表達式？它使更好的正則表達式...

編輯

的另一個問題是，你的HTML是：

... 40％ 'ALIGN = 'right'class =' SmallDimmedText'>。 ..

哪裏有ALIGN = '右' 和類之間沒有空格= 'SmallDimmedText'

但是你該節正則表達式是：

... 40％\'align = \'right \'class = \'SmallDimmedText \'> ...

它指示有空格。

使用DOM解析器它可以爲您節省更多由微妙的錯誤引起的頭痛，比您可以計數。

只是給你一個簡單的解釋使用簡單的HTML DOM的想法。

$html = str_get_html(...); 
$elems = $html->find('.SmallDimmedText'); 
if (count($elems->children()) != 1){ 
    throw new Exception('Too many/few elements found'); 
} 
$text = $elems->children(0)->plaintext; 

//parsing here is only an example, but you have removed all 
//the html so that any regex used is really simple. 
$date = substr($text, strlen('Last Login: ')); 
$unixTime = strtotime($date);

來源

2009-12-02 23:52:40 Yacoby

我看到至少有兩個問題：

，有'right'和class=之間沒有空格，且有一個空間，有你的正則表達式
您必須添加至少這3個字符到匹配字符列表之間，在[]之間：
- ':'（有「登錄」和日期之間的一個），
- 「」 （有「最後」和「登錄」，和之間的空格「：」和日期），
- 和「 /「（之間的日期部分）

有了這個代碼，它似乎更好地工作：

$h = "<td width='40%' align='right'class='SmallDimmedText'>Last Login: 11/14/2009</td>"; 
if (preg_match_all("#<td width='40%' align='right'class='SmallDimmedText'>Last([a-zA-Z0-9\s\.\-',: /]*)<\/td>#", 
     $h, $table_content, PREG_PATTERN_ORDER)) { 
    var_dump($table_content); 
}

我得到這樣的輸出：

array 
    0 => 
    array 
     0 => string '<td width='40%' align='right'class='SmallDimmedText'>Last Login: 11/14/2009</td>' (length=80) 
    1 => 
    array 
     0 => string ' Login: 11/14/2009' (length=18)

注意我也用：

#作爲一個正則表達式分隔符，以避免逃避斜線
"作爲字符串分隔符，以避免必須轉義單引號

來源

2009-12-02 23:56:07

我的第一個建議是儘量減少preg_match_all中的文本數量，爲什麼不在「>」和「<」之間做？其次，我最終會寫這樣的正則表達式，不知道這是否有助於：

/>.*[0-9]{1,2}/[0-9]{1,2}/[0-9]{2,4}</

這將尋找一個標籤，那麼任何字符，然後一個日期，然後將另一個標記的開始結束。

來源

2009-12-02 23:56:27 gonzofish

我同意Yacoby。

最起碼，刪除所有與任何HTML特殊的，只是讓正則表達式

preg_match_all('#Last Login: ([\d+/?]+)#', ...

來源

2009-12-02 23:57:57

php preg_match_all斜線錯誤的日期

回答

相關問題