請考慮以下包含模糊電子郵件地址的腳本,以及試圖使用正則表達式模式匹配基於*****
來替換它們的函數。我的腳本試圖抓住以下幾個詞:"at", "a t", "a.t", "@"
後面跟着一些文字(任何域名),然後是"dot" "." "d.o.t"
,然後是TLD。如何在PHP中捕獲以下混淆電子郵件地址?
輸入:
$str[] = 'dsfatasdfasdf asd dsfasdf [email protected]';
$str[] = 'I live at school where My address is [email protected]';
$str[] = 'I live at school. My address is [email protected]';
$str[] = 'at school my address is [email protected]';
$str[] = 'dsf a t asdfasdf asd dsfasdf [email protected]';
$str[] = 'd s f d s f a t h o t m a i l . c o m';
function clean_text($text){
$pattern = '/(\ba[ \.\-_]*t\b|@)[ \.\-_]*(.+)[ \.\-_]*(d[ \.\-_]*o[ \.\-_]*t|\.)[ \.\-_]*(c[ \.\-_]*o[ \.\-_]*m|n[ \.\-_]*e[ \.\-_]*t|o[ \.\-_]*r[ \.\-_]*g|([a-z][ \.\-_]*){2,3}[a-z]?)/iU';
return preg_replace($pattern, '***', $text);
}
foreach($str as $email){
echo clean_text($email);
}
預期輸出:
dsfatasdfasdf asd dsfasdf dsfdsf***
I live at school where My address is [email protected]***
I live at school. My address is [email protected]***
***
dsf ***
d s f d s f ***
結果:
dsfatasdfasdf asd dsfasdf dsfdsf***
I live ***
I live ***
at school my address is dsfdsf****
dsf ***
d s f d s f ***
問題: 它惹人第一次出現「在」,而不是最後一個,所以會發生以下情況:
input: 'at school my address is [email protected]'
produces: '****'
should produce: 'at school my address is dsfdsf****'
我該如何解決這個問題?
我猜顯而易見的問題是爲什麼地址被混淆了?如果是用戶故意的,那麼你提出的任何方案都將被解決。 – 2010-07-27 16:10:35
@Douglas,很好。即使他們解決了這個問題,我仍然想勸阻他們。 – 2010-07-27 16:11:38
試圖用正則表達式解析自然語言並不是最好的想法,只有這麼多的細微差別,將打破任何你試圖用正則表達式構建。這就像試圖用正則表達式解析HTML一樣。 – HoLyVieR 2010-07-27 16:13:50