2010-07-27 28 views
0

請考慮以下包含模糊電子郵件地址的腳本,以及試圖使用正則表達式模式匹配基於*****來替換它們的函數。我的腳本試圖抓住以下幾個詞:"at", "a t", "a.t", "@"後面跟着一些文字(任何域名),然後是"dot" "." "d.o.t",然後是TLD。如何在PHP中捕獲以下混淆電子郵件地址?

輸入:

$str[] = 'dsfatasdfasdf asd dsfasdf [email protected]'; 
$str[] = 'I live at school where My address is [email protected]'; 
$str[] = 'I live at school. My address is [email protected]'; 
$str[] = 'at school my address is [email protected]'; 
$str[] = 'dsf a t asdfasdf asd dsfasdf [email protected]'; 
$str[] = 'd s f d s f a t h o t m a i l . c o m'; 

function clean_text($text){ 
    $pattern = '/(\ba[ \.\-_]*t\b|@)[ \.\-_]*(.+)[ \.\-_]*(d[ \.\-_]*o[ \.\-_]*t|\.)[ \.\-_]*(c[ \.\-_]*o[ \.\-_]*m|n[ \.\-_]*e[ \.\-_]*t|o[ \.\-_]*r[ \.\-_]*g|([a-z][ \.\-_]*){2,3}[a-z]?)/iU'; 
    return preg_replace($pattern, '***', $text); 
} 

foreach($str as $email){ 
    echo clean_text($email); 
} 

預期輸出:

dsfatasdfasdf asd dsfasdf dsfdsf*** 
I live at school where My address is [email protected]*** 
I live at school. My address is [email protected]*** 
*** 
dsf *** 
d s f d s f *** 

結果:

dsfatasdfasdf asd dsfasdf dsfdsf*** 
I live *** 
I live *** 
at school my address is dsfdsf**** 
dsf *** 
d s f d s f *** 

問題: 它惹人第一次出現「在」,而不是最後一個,所以會發生以下情況:

input: 'at school my address is [email protected]' 
produces: '****' 
should produce: 'at school my address is dsfdsf****' 

我該如何解決這個問題?

+3

我猜顯而易見的問題是爲什麼地址被混淆了?如果是用戶故意的,那麼你提出的任何方案都將被解決。 – 2010-07-27 16:10:35

+0

@Douglas,很好。即使他們解決了這個問題,我仍然想勸阻他們。 – 2010-07-27 16:11:38

+0

試圖用正則表達式解析自然語言並不是最好的想法,只有這麼多的細微差別,將打破任何你試圖用正則表達式構建。這就像試圖用正則表達式解析HTML一樣。 – HoLyVieR 2010-07-27 16:13:50

回答

2

基於M42的正則表達式:

代碼:

$emails = array(
       'dsfatasdfasdf asd dsfasdf [email protected]' 
       ,'I live at school where My address is [email protected]' 
       ,'I live at school. My address is [email protected]' 
       ,'at school my address is [email protected]' 
       ,'dsf a t asdfasdf asd dsfasdf [email protected]' 
       ,'d s f d s f a t h o t m a i l . c o m' 
       ); 

foreach($emails as $email) 
{ 
    $found = preg_match('/(.*?)((\@|a[_. -]*t)[\w .-]*?$)/', $email, $matches); 
    if($found) 
    { 
     echo 'Username: ' . $matches[1] . ', Domain: ' . $matches[2] . "\n"; 
    } 
} 

輸出:

Username: dsfatasdfasdf asd dsfasdf dsfdsf, Domain: @hotmail.com 
Username: I live at school where My address is dsfdsf, Domain: @hotmail.com 
Username: I live at school. My address is dsfdsf, Domain: @hotmail.com 
Username: at school my address is dsfdsf, Domain: @hotmail.com 
Username: dsf a t asdfasdf asd dsfasdf dsfdsf, Domain: @hotmail.com 
Username: d s f d s f , Domain: a t h o t m a i l . c o m 
0
function clean_text($text){ 
    $pattern = '/\w+[\w-\.]*(\@\w+((-\w+)|(\w*))\.[a-z]{2,3})/i'; 
    preg_match($pattern, $text, $matches); 

    return (isset($matches[1])) ? str_replace($matches[1], "****", $text) : $text; 
} 

唯一不符合的是你最後一個,但你明白了。

+0

這很不錯,但我確實需要抓住「at」情況,並且通常是「a t」情況(a,任何非aplhanum,t)。感謝迄今的努力。 – 2010-07-27 16:33:50

+0

儘管不是使用直接preg_replace,而是實際preg_match,然後用電子郵件第二部分的匹配索引替換。所以這至少應該讓你朝着正確的方向前進。 – cynicaljoy 2010-07-27 16:43:24

1

這是一個Perl腳本,可以適應php嗎?

my @l = (
'dsfatasdfasdf asd dsfasdf [email protected]', 
'I live at school where My address is [email protected]', 
'I live at school. My address is [email protected]', 
'at school my address is [email protected]', 
'dsf a t asdfasdf asd dsfasdf [email protected]', 
'd s f d s f a t h o t m a i l . c o m' 
); 

foreach(@l) { 
    s/(\@|a[_. -]*t)[\w .-]*?$/****/; 
    print $_,"\n"; 
} 

輸出:

dsfatasdfasdf asd dsfasdf dsfdsf**** 
I live at school where My address is dsfdsf**** 
I live at school. My address is dsfdsf**** 
at school my address is dsfdsf**** 
dsf a t asdfasdf asd dsfasdf dsfdsf**** 
d s f d s f ****