2008-10-30 54 views
2

我正在尋找一個好的JavaScript RegEx將名稱轉換爲正確的例子。例如:JS人類正則表達式

John SMITH = John Smith 

Mary O'SMITH = Mary O'Smith 

E.t MCHYPHEN-SMITH = E.T McHyphen-Smith 

John Middlename SMITH = John Middlename SMITH 

那麼你明白了。

任何人都想出了一個全面的解決方案?

回答

1

這樣的事情?

function fix_name(name) { 
    var replacer = function (whole,prefix,word) { 
     ret = []; 
     if (prefix) { 
      ret.push(prefix.charAt(0).toUpperCase()); 
      ret.push(prefix.substr(1).toLowerCase()); 
     } 
     ret.push(word.charAt(0).toUpperCase()); 
     ret.push(word.substr(1).toLowerCase()); 
     return ret.join(''); 
    } 
    var pattern = /\b(ma?c)?([a-z]+)/ig; 
    return name.replace(pattern, replacer); 
} 
0

不幸的是,有太多不同的名稱格式可以正確地執行此操作。約翰 - 喬麥克唐納總是會成爲一個麻煩!

+0

你說對了!他仍然是他在小學時的同一個rapscallion! – eyelidlessness 2008-10-30 16:07:48

+1

你應該看看他對莉莉安達席爾瓦的挎包做了些什麼,這樣一個小小的戰利品! – harriyott 2008-10-30 17:22:20

0

同意它永遠不會是完美的,但希望得到最常見的情況。這是相當多的駱駝案件的任何「單詞」,並處理連字符和撇號的我想作爲空間。

0

Wimps!....這是我的第二次嘗試。處理 「約翰·史密斯」, 「瑪麗O'SMITH」 「約翰中間名SMITH」, 「ET外星人MCHYPHEN-SMITH」 和 「約翰 - 麥克唐納JOE」

Regex fixnames = new Regex("(Ma?C)?(\w)(\w*)(\W*)"); 
string newName = fixnames.Replace(badName, NameFixer); 


static public string NameFixer(Match match) 
{ 
    string mc = ""; 
    if (match.Groups[1].Captures.Count > 0) 
    { 
     if (match.Groups[1].Captures[0].Length == 3) 
      mc = "Mac"; 
     else 
      mc = "Mc"; 
    } 

    return 
     mc 
     +match.Groups[2].Captures[0].Value.ToUpper() 
     +match.Groups[3].Captures[0].Value.ToLower() 
     +match.Groups[4].Captures[0].Value; 
} 

注意:我意識到你想要一個JavaScript解決方案的時間而不是一個.NET,我有太多有趣的停止....

1

馬克·薩默做這方面的一個全面的工作與Lingua::EN::NameCase

KEITH    Keith 
LEIGH-WILLIAMS  Leigh-Williams 
MCCARTHY   McCarthy 
O'CALLAGHAN   O'Callaghan 
ST. JOHN   St. John 
VON STREIT   von Streit 
VAN DYKE   van Dyke 
AP LLWYD DAFYDD  ap Llwyd Dafydd 
henry viii   Henry VIII 
louis xiv   Louis XIV 

以上是用Perl編寫的,但它使大量使用正則表達式,所以你應該能夠收集到一些好的技術。

在這裏,有關人士:

sub nc { 

    croak "Usage: nc [[\\]\$SCALAR]" 
     if scalar @_ > 1 or (ref $_[0] and ref $_[0] ne 'SCALAR') ; 

    local($_) = @_ if @_ ; 
    $_ = ${$_} if ref($_) ;   # Replace reference with value. 

    $_ = lc ;       # Lowercase the lot. 
    s{ \b (\w) }{\u$1}gox ;   # Uppercase first letter of every word. 
    s{ (\'\w) \b }{\L$1}gox ;   # Lowercase 's. 

    # Name case Mcs and Macs - taken straight from NameParse.pm incl. comments. 
    # Exclude names with 1-2 letters after prefix like Mack, Macky, Mace 
    # Exclude names ending in a,c,i,o, or j are typically Polish or Italian 

    if (/\bMac[A-Za-z]{2,}[^aciozj]\b/o or /\bMc/o) { 
     s/\b(Ma?c)([A-Za-z]+)/$1\u$2/go ; 

     # Now correct for "Mac" exceptions 
     s/\bMacEvicius/Macevicius/go ; # Lithuanian 
     s/\bMacHado/Machado/go ;  # Portuguese 
     s/\bMacHar/Machar/go ; 
     s/\bMacHin/Machin/go ; 
     s/\bMacHlin/Machlin/go ; 
     s/\bMacIas/Macias/go ; 
     s/\bMacIulis/Maciulis/go ; 
     s/\bMacKie/Mackie/go ; 
     s/\bMacKle/Mackle/go ; 
     s/\bMacKlin/Macklin/go ; 
     s/\bMacQuarie/Macquarie/go ; 
    s/\bMacOmber/Macomber/go ; 
    s/\bMacIn/Macin/go ; 
    s/\bMacKintosh/Mackintosh/go ; 
    s/\bMacKen/Macken/go ; 
    s/\bMacHen/Machen/go ; 
    s/\bMacisaac/MacIsaac/go ; 
    s/\bMacHiel/Machiel/go ; 
    s/\bMacIol/Maciol/go ; 
    s/\bMacKell/Mackell/go ; 
    s/\bMacKlem/Macklem/go ; 
    s/\bMacKrell/Mackrell/go ; 
    s/\bMacLin/Maclin/go ; 
    s/\bMacKey/Mackey/go ; 
    s/\bMacKley/Mackley/go ; 
    s/\bMacHell/Machell/go ; 
    s/\bMacHon/Machon/go ; 
    } 
    s/Macmurdo/MacMurdo/go ; 

    # Fixes for "son (daughter) of" etc. in various languages. 
    s{ \b Al(?=\s+\w) }{al}gox ; # al Arabic or forename Al. 
    s{ \b Ap  \b }{ap}gox ;  # ap Welsh. 
    s{ \b Ben(?=\s+\w) }{ben}gox ; # ben Hebrew or forename Ben. 
    s{ \b Dell([ae])\b }{dell$1}gox ; # della and delle Italian. 
    s{ \b D([aeiu]) \b }{d$1}gox ;  # da, de, di Italian; du French. 
    s{ \b De([lr]) \b }{de$1}gox ;  # del Italian; der Dutch/Flemish. 
    s{ \b El  \b }{el}gox unless $SPANISH ; # el Greek or El Spanish. 
    s{ \b La  \b }{la}gox unless $SPANISH ; # la French or La Spanish. 
    s{ \b L([eo]) \b }{l$1}gox ;  # lo Italian; le French. 
    s{ \b Van(?=\s+\w) }{van}gox ; # van German or forename Van. 
    s{ \b Von  \b }{von}gox ; # von Dutch/Flemish 

    # Fixes for roman numeral names, e.g. Henry VIII, up to 89, LXXXIX 
    s{ \b ((?: [Xx]{1,3} | [Xx][Ll] | [Ll][Xx]{0,3})? 
      (?: [Ii]{1,3} | [Ii][VvXx] | [Vv][Ii]{0,3})?) \b }{\U$1}gox ; 

    $_ ; 
}