2010-07-30 125 views
52

我想用正常的替換替換重音字符。以下是我目前正在做的事情。替換重音字符php

$string = "Éric Cantona"; 
    $strict = strtolower($string); 

    echo "After Lower: ".$strict; 

    $patterns[0] = '/[á|â|à|å|ä]/'; 
    $patterns[1] = '/[ð|é|ê|è|ë]/'; 
    $patterns[2] = '/[í|î|ì|ï]/'; 
    $patterns[3] = '/[ó|ô|ò|ø|õ|ö]/'; 
    $patterns[4] = '/[ú|û|ù|ü]/'; 
    $patterns[5] = '/æ/'; 
    $patterns[6] = '/ç/'; 
    $patterns[7] = '/ß/'; 
    $replacements[0] = 'a'; 
    $replacements[1] = 'e'; 
    $replacements[2] = 'i'; 
    $replacements[3] = 'o'; 
    $replacements[4] = 'u'; 
    $replacements[5] = 'ae'; 
    $replacements[6] = 'c'; 
    $replacements[7] = 'ss'; 

    $strict = preg_replace($patterns, $replacements, $strict); 
    echo "Final: ".$strict; 

這給了我:

After Lower: éric cantona 
    Final: ric cantona 

上面給我ric cantona我想輸出爲eric cantona

任何人都可以幫助我去哪裏我錯了嗎?

+1

爲了什麼它的價值,我複製並粘貼,並跑這個逐字,並得到了「eric cantona」(使用PHP 5.2.9-4) – 2010-07-30 13:10:04

+1

@brandon它將取決於您保存該文件的編碼。我假設該蜥蜴將它保存爲utf-8,並將其保存爲iso-8859-1。 – troelskn 2010-07-30 13:12:22

+0

您使用的是哪個版本的PHP? – 2010-07-30 13:14:00

回答

113

我曾嘗試基於答案列出的變化各種各樣,但下面的工作:

$unwanted_array = array( 'Š'=>'S', 'š'=>'s', 'Ž'=>'Z', 'ž'=>'z', 'À'=>'A', 'Á'=>'A', 'Â'=>'A', 'Ã'=>'A', 'Ä'=>'A', 'Å'=>'A', 'Æ'=>'A', 'Ç'=>'C', 'È'=>'E', 'É'=>'E', 
          'Ê'=>'E', 'Ë'=>'E', 'Ì'=>'I', 'Í'=>'I', 'Î'=>'I', 'Ï'=>'I', 'Ñ'=>'N', 'Ò'=>'O', 'Ó'=>'O', 'Ô'=>'O', 'Õ'=>'O', 'Ö'=>'O', 'Ø'=>'O', 'Ù'=>'U', 
          'Ú'=>'U', 'Û'=>'U', 'Ü'=>'U', 'Ý'=>'Y', 'Þ'=>'B', 'ß'=>'Ss', 'à'=>'a', 'á'=>'a', 'â'=>'a', 'ã'=>'a', 'ä'=>'a', 'å'=>'a', 'æ'=>'a', 'ç'=>'c', 
          'è'=>'e', 'é'=>'e', 'ê'=>'e', 'ë'=>'e', 'ì'=>'i', 'í'=>'i', 'î'=>'i', 'ï'=>'i', 'ð'=>'o', 'ñ'=>'n', 'ò'=>'o', 'ó'=>'o', 'ô'=>'o', 'õ'=>'o', 
          'ö'=>'o', 'ø'=>'o', 'ù'=>'u', 'ú'=>'u', 'û'=>'u', 'ý'=>'y', 'þ'=>'b', 'ÿ'=>'y'); 
$str = strtr($str, $unwanted_array); 
+9

添加這些土耳其的支持:''Ğ'=>'G','İ'=>'我' ,'Ş'=>'S',''=>'g','ı'=>'i','ş'=>'s','ü'=>'u',' – 2012-03-20 10:18:49

+4

添加這些對於羅馬尼亞的支持:'ă'=>'a','Ă'=>'A','ş'=>'s','Ş'=>'S','ţ'=>'t' Ţ'=>'T' – Vlad 2013-03-24 21:52:42

+3

有一個小錯誤:'ß'不能翻譯爲'S',但必須用'ss'替換。 這個德國專有字符從來沒有用在大寫範圍內。 – KTB 2014-01-22 10:18:59

2

strtolower只適用於iso-8859-1編碼的字符串。你可以試試mb_strtolower

或者,如果你有一個多字節擴展到裂傷,你還不如用的iconv的音譯支持:

iconv("UTF-8", "ISO-8859-1//TRANSLIT", $text); 

編輯:

看來我還是有點快。你似乎使用iso-8859-1,所以你現在的策略會起作用。你只需要正確地編寫正則表達式。例如:

'/(ð|é|ê|è|ë)/' 

不是:

'/[ð|é|ê|è|ë]/' 
+0

strtolower似乎與我的代碼 – Lizard 2010-07-30 13:10:22

+2

真的嗎? '$ how_I_like_php - ;' – MvanGeest 2010-07-30 13:11:04

+0

@Mvan他們推出了PHP6中的unicode支持 – NullUserException 2010-07-30 13:12:28

60

要刪除的變音符號,使用的iconv:

$val = iconv('ISO-8859-1','ASCII//TRANSLIT',$val); 

$val = iconv('UTF-8','ASCII//TRANSLIT',$val); 

注意PHP有一些怪異的bug它(有時候?)需要設置一個語言環境來使這些語言轉換sions使用setlocale()工作。

編輯只是測試,它就會你變音符號的開箱:

$val = "á|â|à|å|ä ð|é|ê|è|ë í|î|ì|ï ó|ô|ò|ø|õ|ö ú|û|ù|ü æ ç ß abc ABC 123"; 
echo iconv('UTF-8','ASCII//TRANSLIT',$val); 

輸出:

a|a|a|a|a ?|e|e|e|e i|i|i|i o|o|o|?|o|o u|u|u|u ae c ss abc ABC 123 

所以,你可能要手工修復這兩個奇數在調用iconv之前,或者深入研究php的內部工作並實際修復它。

+21

值得注意的是'iconv'會出錯,並將字符串切斷爲'非法字符'。爲了解決這個問題,你可以使用'iconv('UTF-8','ASCII // TRANSLIT // IGNORE',$ val)' – Rowan 2011-03-10 15:17:59

+1

這裏沒有工作。通過'iconv('ISO-8859-1','ASCII // TRANSLIT',$ val)','áêìõç'變成了'''''oc'。 – 2014-09-02 19:44:44

+3

不工作在PHP 5.3.23è返回? – RMiranda 2014-09-11 09:35:32

4

Disclaimer: I'm not supporting this answer anymore (I was blind at that time). But thanks for the up-votes =P

你可以把它作爲基礎。從WordPress的,用來生成漂亮的URL(入口點是slugify()函數):

/** 
* Converts all accent characters to ASCII characters. 
* 
* If there are no accent characters, then the string given is just returned. 
* 
* @param string $string Text that might have accent characters 
* @return string Filtered string with replaced "nice" characters. 
*/ 

function remove_accents($string) { 
if (!preg_match('/[\x80-\xff]/', $string)) 
    return $string; 
if (seems_utf8($string)) { 
    $chars = array(
    // Decompositions for Latin-1 Supplement 
    chr(195).chr(128) => 'A', chr(195).chr(129) => 'A', 
    chr(195).chr(130) => 'A', chr(195).chr(131) => 'A', 
    chr(195).chr(132) => 'A', chr(195).chr(133) => 'A', 
    chr(195).chr(135) => 'C', chr(195).chr(136) => 'E', 
    chr(195).chr(137) => 'E', chr(195).chr(138) => 'E', 
    chr(195).chr(139) => 'E', chr(195).chr(140) => 'I', 
    chr(195).chr(141) => 'I', chr(195).chr(142) => 'I', 
    chr(195).chr(143) => 'I', chr(195).chr(145) => 'N', 
    chr(195).chr(146) => 'O', chr(195).chr(147) => 'O', 
    chr(195).chr(148) => 'O', chr(195).chr(149) => 'O', 
    chr(195).chr(150) => 'O', chr(195).chr(153) => 'U', 
    chr(195).chr(154) => 'U', chr(195).chr(155) => 'U', 
    chr(195).chr(156) => 'U', chr(195).chr(157) => 'Y', 
    chr(195).chr(159) => 's', chr(195).chr(160) => 'a', 
    chr(195).chr(161) => 'a', chr(195).chr(162) => 'a', 
    chr(195).chr(163) => 'a', chr(195).chr(164) => 'a', 
    chr(195).chr(165) => 'a', chr(195).chr(167) => 'c', 
    chr(195).chr(168) => 'e', chr(195).chr(169) => 'e', 
    chr(195).chr(170) => 'e', chr(195).chr(171) => 'e', 
    chr(195).chr(172) => 'i', chr(195).chr(173) => 'i', 
    chr(195).chr(174) => 'i', chr(195).chr(175) => 'i', 
    chr(195).chr(177) => 'n', chr(195).chr(178) => 'o', 
    chr(195).chr(179) => 'o', chr(195).chr(180) => 'o', 
    chr(195).chr(181) => 'o', chr(195).chr(182) => 'o', 
    chr(195).chr(182) => 'o', chr(195).chr(185) => 'u', 
    chr(195).chr(186) => 'u', chr(195).chr(187) => 'u', 
    chr(195).chr(188) => 'u', chr(195).chr(189) => 'y', 
    chr(195).chr(191) => 'y', 
    // Decompositions for Latin Extended-A 
    chr(196).chr(128) => 'A', chr(196).chr(129) => 'a', 
    chr(196).chr(130) => 'A', chr(196).chr(131) => 'a', 
    chr(196).chr(132) => 'A', chr(196).chr(133) => 'a', 
    chr(196).chr(134) => 'C', chr(196).chr(135) => 'c', 
    chr(196).chr(136) => 'C', chr(196).chr(137) => 'c', 
    chr(196).chr(138) => 'C', chr(196).chr(139) => 'c', 
    chr(196).chr(140) => 'C', chr(196).chr(141) => 'c', 
    chr(196).chr(142) => 'D', chr(196).chr(143) => 'd', 
    chr(196).chr(144) => 'D', chr(196).chr(145) => 'd', 
    chr(196).chr(146) => 'E', chr(196).chr(147) => 'e', 
    chr(196).chr(148) => 'E', chr(196).chr(149) => 'e', 
    chr(196).chr(150) => 'E', chr(196).chr(151) => 'e', 
    chr(196).chr(152) => 'E', chr(196).chr(153) => 'e', 
    chr(196).chr(154) => 'E', chr(196).chr(155) => 'e', 
    chr(196).chr(156) => 'G', chr(196).chr(157) => 'g', 
    chr(196).chr(158) => 'G', chr(196).chr(159) => 'g', 
    chr(196).chr(160) => 'G', chr(196).chr(161) => 'g', 
    chr(196).chr(162) => 'G', chr(196).chr(163) => 'g', 
    chr(196).chr(164) => 'H', chr(196).chr(165) => 'h', 
    chr(196).chr(166) => 'H', chr(196).chr(167) => 'h', 
    chr(196).chr(168) => 'I', chr(196).chr(169) => 'i', 
    chr(196).chr(170) => 'I', chr(196).chr(171) => 'i', 
    chr(196).chr(172) => 'I', chr(196).chr(173) => 'i', 
    chr(196).chr(174) => 'I', chr(196).chr(175) => 'i', 
    chr(196).chr(176) => 'I', chr(196).chr(177) => 'i', 
    chr(196).chr(178) => 'IJ',chr(196).chr(179) => 'ij', 
    chr(196).chr(180) => 'J', chr(196).chr(181) => 'j', 
    chr(196).chr(182) => 'K', chr(196).chr(183) => 'k', 
    chr(196).chr(184) => 'k', chr(196).chr(185) => 'L', 
    chr(196).chr(186) => 'l', chr(196).chr(187) => 'L', 
    chr(196).chr(188) => 'l', chr(196).chr(189) => 'L', 
    chr(196).chr(190) => 'l', chr(196).chr(191) => 'L', 
    chr(197).chr(128) => 'l', chr(197).chr(129) => 'L', 
    chr(197).chr(130) => 'l', chr(197).chr(131) => 'N', 
    chr(197).chr(132) => 'n', chr(197).chr(133) => 'N', 
    chr(197).chr(134) => 'n', chr(197).chr(135) => 'N', 
    chr(197).chr(136) => 'n', chr(197).chr(137) => 'N', 
    chr(197).chr(138) => 'n', chr(197).chr(139) => 'N', 
    chr(197).chr(140) => 'O', chr(197).chr(141) => 'o', 
    chr(197).chr(142) => 'O', chr(197).chr(143) => 'o', 
    chr(197).chr(144) => 'O', chr(197).chr(145) => 'o', 
    chr(197).chr(146) => 'OE',chr(197).chr(147) => 'oe', 
    chr(197).chr(148) => 'R',chr(197).chr(149) => 'r', 
    chr(197).chr(150) => 'R',chr(197).chr(151) => 'r', 
    chr(197).chr(152) => 'R',chr(197).chr(153) => 'r', 
    chr(197).chr(154) => 'S',chr(197).chr(155) => 's', 
    chr(197).chr(156) => 'S',chr(197).chr(157) => 's', 
    chr(197).chr(158) => 'S',chr(197).chr(159) => 's', 
    chr(197).chr(160) => 'S', chr(197).chr(161) => 's', 
    chr(197).chr(162) => 'T', chr(197).chr(163) => 't', 
    chr(197).chr(164) => 'T', chr(197).chr(165) => 't', 
    chr(197).chr(166) => 'T', chr(197).chr(167) => 't', 
    chr(197).chr(168) => 'U', chr(197).chr(169) => 'u', 
    chr(197).chr(170) => 'U', chr(197).chr(171) => 'u', 
    chr(197).chr(172) => 'U', chr(197).chr(173) => 'u', 
    chr(197).chr(174) => 'U', chr(197).chr(175) => 'u', 
    chr(197).chr(176) => 'U', chr(197).chr(177) => 'u', 
    chr(197).chr(178) => 'U', chr(197).chr(179) => 'u', 
    chr(197).chr(180) => 'W', chr(197).chr(181) => 'w', 
    chr(197).chr(182) => 'Y', chr(197).chr(183) => 'y', 
    chr(197).chr(184) => 'Y', chr(197).chr(185) => 'Z', 
    chr(197).chr(186) => 'z', chr(197).chr(187) => 'Z', 
    chr(197).chr(188) => 'z', chr(197).chr(189) => 'Z', 
    chr(197).chr(190) => 'z', chr(197).chr(191) => 's', 
    // Euro Sign 
    chr(226).chr(130).chr(172) => 'E', 
    // GBP (Pound) Sign 
    chr(194).chr(163) => ''); 
    $string = strtr($string, $chars); 
} else { 
    // Assume ISO-8859-1 if not UTF-8 
    $chars['in'] = chr(128).chr(131).chr(138).chr(142).chr(154).chr(158) 
    .chr(159).chr(162).chr(165).chr(181).chr(192).chr(193).chr(194) 
    .chr(195).chr(196).chr(197).chr(199).chr(200).chr(201).chr(202) 
    .chr(203).chr(204).chr(205).chr(206).chr(207).chr(209).chr(210) 
    .chr(211).chr(212).chr(213).chr(214).chr(216).chr(217).chr(218) 
    .chr(219).chr(220).chr(221).chr(224).chr(225).chr(226).chr(227) 
    .chr(228).chr(229).chr(231).chr(232).chr(233).chr(234).chr(235) 
    .chr(236).chr(237).chr(238).chr(239).chr(241).chr(242).chr(243) 
    .chr(244).chr(245).chr(246).chr(248).chr(249).chr(250).chr(251) 
    .chr(252).chr(253).chr(255); 
    $chars['out'] = "EfSZszYcYuAAAAAACEEEEIIIINOOOOOOUUUUYaaaaaaceeeeiiiinoooooouuuuyy"; 
    $string = strtr($string, $chars['in'], $chars['out']); 
    $double_chars['in'] = array(chr(140), chr(156), chr(198), chr(208), chr(222), chr(223), chr(230), chr(240), chr(254)); 
    $double_chars['out'] = array('OE', 'oe', 'AE', 'DH', 'TH', 'ss', 'ae', 'dh', 'th'); 
    $string = str_replace($double_chars['in'], $double_chars['out'], $string); 
} 
return $string; 
} 

/** 
* Checks to see if a string is utf8 encoded. 
* 
* @author bmorel at ssi dot fr 
* 
* @param string $Str The string to be checked 
* @return bool True if $Str fits a UTF-8 model, false otherwise. 
*/ 
function seems_utf8($Str) { # by bmorel at ssi dot fr 
$length = strlen($Str); 
for ($i = 0; $i < $length; $i++) { 
    if (ord($Str[$i]) < 0x80) continue; # 0bbbbbbb 
    elseif ((ord($Str[$i]) & 0xE0) == 0xC0) $n = 1; # 110bbbbb 
    elseif ((ord($Str[$i]) & 0xF0) == 0xE0) $n = 2; # 1110bbbb 
    elseif ((ord($Str[$i]) & 0xF8) == 0xF0) $n = 3; # 11110bbb 
    elseif ((ord($Str[$i]) & 0xFC) == 0xF8) $n = 4; # 111110bb 
    elseif ((ord($Str[$i]) & 0xFE) == 0xFC) $n = 5; # 1111110b 
    else return false; # Does not match any model 
    for ($j = 0; $j < $n; $j++) { # n bytes matching 10bbbbbb follow ? 
    if ((++$i == $length) || ((ord($Str[$i]) & 0xC0) != 0x80)) 
    return false; 
    } 
} 
return true; 
} 

function utf8_uri_encode($utf8_string, $length = 0) { 
$unicode = ''; 
$values = array(); 
$num_octets = 1; 
$unicode_length = 0; 
$string_length = strlen($utf8_string); 
for ($i = 0; $i < $string_length; $i++) { 
    $value = ord($utf8_string[$i]); 
    if ($value < 128) { 
    if ($length && ($unicode_length >= $length)) 
    break; 
    $unicode .= chr($value); 
    $unicode_length++; 
    } else { 
    if (count($values) == 0) $num_octets = ($value < 224) ? 2 : 3; 
    $values[] = $value; 
    if ($length && ($unicode_length + ($num_octets * 3)) > $length) 
    break; 
    if (count($values) == $num_octets) { 
    if ($num_octets == 3) { 
    $unicode .= '%' . dechex($values[0]) . '%' . dechex($values[1]) . '%' . dechex($values[2]); 
    $unicode_length += 9; 
    } else { 
    $unicode .= '%' . dechex($values[0]) . '%' . dechex($values[1]); 
    $unicode_length += 6; 
    } 
    $values = array(); 
    $num_octets = 1; 
    } 
    } 
} 
return $unicode; 
} 

/** 
* Sanitizes title, replacing whitespace with dashes. 
* 
* Limits the output to alphanumeric characters, underscore (_) and dash (-). 
* Whitespace becomes a dash. 
* 
* @param string $title The title to be sanitized. 
* @return string The sanitized title. 
*/ 
function slugify($title) { 
$title = strip_tags($title); 
// Preserve escaped octets. 
$title = preg_replace('|%([a-fA-F0-9][a-fA-F0-9])|', '---$1---', $title); 
// Remove percent signs that are not part of an octet. 
$title = str_replace('%', '', $title); 
// Restore octets. 
$title = preg_replace('|---([a-fA-F0-9][a-fA-F0-9])---|', '%$1', $title); 
$title = remove_accents($title); 
if (seems_utf8($title)) { 
    if (function_exists('mb_strtolower')) { 
    $title = mb_strtolower($title, 'UTF-8'); 
    } 
    $title = utf8_uri_encode($title, 200); 
} 
$title = strtolower($title); 
$title = preg_replace('/&.+?;/', '', $title); // kill entities 
$title = preg_replace('/[^%a-z0-9 _-]/', '', $title); 
$title = preg_replace('/\s+/', '-', $title); 
$title = preg_replace('|-+|', '-', $title); 
$title = trim($title, '-'); 
return $title; 
} 
+0

謝謝你。我想在Wordpress網站上做到這一點,並沒有意識到Wordpress有一個內置的功能:) – 2017-02-13 18:55:58

8

所以我發現這在PHP上。對於preg_replace函數的功能網頁

// replace accented chars 

$string = "Zacarías Ferreíra"; // my definition for string variable 
$accents = '/&([A-Za-z]{1,2})(grave|acute|circ|cedil|uml|lig);/'; 

$string_encoded = htmlentities($string,ENT_NOQUOTES,'UTF-8'); 

$string = preg_replace($accents,'$1',$string_encoded); 

如果你有編碼問題,你可能會得到成才這樣的「ZacarÃÂasFerreÃÂra」,只是解碼字符串和使用上面說

$string = utf8_decode("Zacarías Ferreíra"); 
30

代碼我只是來翻過答案從蜥蜴這是非常有用的 - 特別是當你做一些分類。是不是很漂亮,我們需要多少個字符說大多相同;)

如果任何人如果尋找一個集所有功能於解決方案(就如上面講的評論),這裏的副本&粘貼:

/** 
* Replace language-specific characters by ASCII-equivalents. 
* @param string $s 
* @return string 
*/ 
public static function normalizeChars($s) { 
    $replace = array(
     'ъ'=>'-', 'Ь'=>'-', 'Ъ'=>'-', 'ь'=>'-', 
     'Ă'=>'A', 'Ą'=>'A', 'À'=>'A', 'Ã'=>'A', 'Á'=>'A', 'Æ'=>'A', 'Â'=>'A', 'Å'=>'A', 'Ä'=>'Ae', 
     'Þ'=>'B', 
     'Ć'=>'C', 'ץ'=>'C', 'Ç'=>'C', 
     'È'=>'E', 'Ę'=>'E', 'É'=>'E', 'Ë'=>'E', 'Ê'=>'E', 
     'Ğ'=>'G', 
     'İ'=>'I', 'Ï'=>'I', 'Î'=>'I', 'Í'=>'I', 'Ì'=>'I', 
     'Ł'=>'L', 
     'Ñ'=>'N', 'Ń'=>'N', 
     'Ø'=>'O', 'Ó'=>'O', 'Ò'=>'O', 'Ô'=>'O', 'Õ'=>'O', 'Ö'=>'Oe', 
     'Ş'=>'S', 'Ś'=>'S', 'Ș'=>'S', 'Š'=>'S', 
     'Ț'=>'T', 
     'Ù'=>'U', 'Û'=>'U', 'Ú'=>'U', 'Ü'=>'Ue', 
     'Ý'=>'Y', 
     'Ź'=>'Z', 'Ž'=>'Z', 'Ż'=>'Z', 
     'â'=>'a', 'ǎ'=>'a', 'ą'=>'a', 'á'=>'a', 'ă'=>'a', 'ã'=>'a', 'Ǎ'=>'a', 'а'=>'a', 'А'=>'a', 'å'=>'a', 'à'=>'a', 'א'=>'a', 'Ǻ'=>'a', 'Ā'=>'a', 'ǻ'=>'a', 'ā'=>'a', 'ä'=>'ae', 'æ'=>'ae', 'Ǽ'=>'ae', 'ǽ'=>'ae', 
     'б'=>'b', 'ב'=>'b', 'Б'=>'b', 'þ'=>'b', 
     'ĉ'=>'c', 'Ĉ'=>'c', 'Ċ'=>'c', 'ć'=>'c', 'ç'=>'c', 'ц'=>'c', 'צ'=>'c', 'ċ'=>'c', 'Ц'=>'c', 'Č'=>'c', 'č'=>'c', 'Ч'=>'ch', 'ч'=>'ch', 
     'ד'=>'d', 'ď'=>'d', 'Đ'=>'d', 'Ď'=>'d', 'đ'=>'d', 'д'=>'d', 'Д'=>'D', 'ð'=>'d', 
     'є'=>'e', 'ע'=>'e', 'е'=>'e', 'Е'=>'e', 'Ə'=>'e', 'ę'=>'e', 'ĕ'=>'e', 'ē'=>'e', 'Ē'=>'e', 'Ė'=>'e', 'ė'=>'e', 'ě'=>'e', 'Ě'=>'e', 'Є'=>'e', 'Ĕ'=>'e', 'ê'=>'e', 'ə'=>'e', 'è'=>'e', 'ë'=>'e', 'é'=>'e', 
     'ф'=>'f', 'ƒ'=>'f', 'Ф'=>'f', 
     'ġ'=>'g', 'Ģ'=>'g', 'Ġ'=>'g', 'Ĝ'=>'g', 'Г'=>'g', 'г'=>'g', 'ĝ'=>'g', 'ğ'=>'g', 'ג'=>'g', 'Ґ'=>'g', 'ґ'=>'g', 'ģ'=>'g', 
     'ח'=>'h', 'ħ'=>'h', 'Х'=>'h', 'Ħ'=>'h', 'Ĥ'=>'h', 'ĥ'=>'h', 'х'=>'h', 'ה'=>'h', 
     'î'=>'i', 'ï'=>'i', 'í'=>'i', 'ì'=>'i', 'į'=>'i', 'ĭ'=>'i', 'ı'=>'i', 'Ĭ'=>'i', 'И'=>'i', 'ĩ'=>'i', 'ǐ'=>'i', 'Ĩ'=>'i', 'Ǐ'=>'i', 'и'=>'i', 'Į'=>'i', 'י'=>'i', 'Ї'=>'i', 'Ī'=>'i', 'І'=>'i', 'ї'=>'i', 'і'=>'i', 'ī'=>'i', 'ij'=>'ij', 'IJ'=>'ij', 
     'й'=>'j', 'Й'=>'j', 'Ĵ'=>'j', 'ĵ'=>'j', 'я'=>'ja', 'Я'=>'ja', 'Э'=>'je', 'э'=>'je', 'ё'=>'jo', 'Ё'=>'jo', 'ю'=>'ju', 'Ю'=>'ju', 
     'ĸ'=>'k', 'כ'=>'k', 'Ķ'=>'k', 'К'=>'k', 'к'=>'k', 'ķ'=>'k', 'ך'=>'k', 
     'Ŀ'=>'l', 'ŀ'=>'l', 'Л'=>'l', 'ł'=>'l', 'ļ'=>'l', 'ĺ'=>'l', 'Ĺ'=>'l', 'Ļ'=>'l', 'л'=>'l', 'Ľ'=>'l', 'ľ'=>'l', 'ל'=>'l', 
     'מ'=>'m', 'М'=>'m', 'ם'=>'m', 'м'=>'m', 
     'ñ'=>'n', 'н'=>'n', 'Ņ'=>'n', 'ן'=>'n', 'ŋ'=>'n', 'נ'=>'n', 'Н'=>'n', 'ń'=>'n', 'Ŋ'=>'n', 'ņ'=>'n', 'ʼn'=>'n', 'Ň'=>'n', 'ň'=>'n', 
     'о'=>'o', 'О'=>'o', 'ő'=>'o', 'õ'=>'o', 'ô'=>'o', 'Ő'=>'o', 'ŏ'=>'o', 'Ŏ'=>'o', 'Ō'=>'o', 'ō'=>'o', 'ø'=>'o', 'ǿ'=>'o', 'ǒ'=>'o', 'ò'=>'o', 'Ǿ'=>'o', 'Ǒ'=>'o', 'ơ'=>'o', 'ó'=>'o', 'Ơ'=>'o', 'œ'=>'oe', 'Œ'=>'oe', 'ö'=>'oe', 
     'פ'=>'p', 'ף'=>'p', 'п'=>'p', 'П'=>'p', 
     'ק'=>'q', 
     'ŕ'=>'r', 'ř'=>'r', 'Ř'=>'r', 'ŗ'=>'r', 'Ŗ'=>'r', 'ר'=>'r', 'Ŕ'=>'r', 'Р'=>'r', 'р'=>'r', 
     'ș'=>'s', 'с'=>'s', 'Ŝ'=>'s', 'š'=>'s', 'ś'=>'s', 'ס'=>'s', 'ş'=>'s', 'С'=>'s', 'ŝ'=>'s', 'Щ'=>'sch', 'щ'=>'sch', 'ш'=>'sh', 'Ш'=>'sh', 'ß'=>'ss', 
     'т'=>'t', 'ט'=>'t', 'ŧ'=>'t', 'ת'=>'t', 'ť'=>'t', 'ţ'=>'t', 'Ţ'=>'t', 'Т'=>'t', 'ț'=>'t', 'Ŧ'=>'t', 'Ť'=>'t', '™'=>'tm', 
     'ū'=>'u', 'у'=>'u', 'Ũ'=>'u', 'ũ'=>'u', 'Ư'=>'u', 'ư'=>'u', 'Ū'=>'u', 'Ǔ'=>'u', 'ų'=>'u', 'Ų'=>'u', 'ŭ'=>'u', 'Ŭ'=>'u', 'Ů'=>'u', 'ů'=>'u', 'ű'=>'u', 'Ű'=>'u', 'Ǖ'=>'u', 'ǔ'=>'u', 'Ǜ'=>'u', 'ù'=>'u', 'ú'=>'u', 'û'=>'u', 'У'=>'u', 'ǚ'=>'u', 'ǜ'=>'u', 'Ǚ'=>'u', 'Ǘ'=>'u', 'ǖ'=>'u', 'ǘ'=>'u', 'ü'=>'ue', 
     'в'=>'v', 'ו'=>'v', 'В'=>'v', 
     'ש'=>'w', 'ŵ'=>'w', 'Ŵ'=>'w', 
     'ы'=>'y', 'ŷ'=>'y', 'ý'=>'y', 'ÿ'=>'y', 'Ÿ'=>'y', 'Ŷ'=>'y', 
     'Ы'=>'y', 'ž'=>'z', 'З'=>'z', 'з'=>'z', 'ź'=>'z', 'ז'=>'z', 'ż'=>'z', 'ſ'=>'z', 'Ж'=>'zh', 'ж'=>'zh' 
    ); 
    return strtr($s, $replace); 
} 

注:關於德國的變音(ä=> AE)

編輯了一些細微的變化:基礎上,發佈從user3682119(除了版權符號)包括多個字符,並從daker評論。

+1

感謝您從@Lizard更新列表。儘管如此,至少波蘭人仍然缺少一些字符:''''=>'A','''=>'a','Ć'=>'C','ć'=>'c',' '''''','''''','''','','','','','' ''''''=>'S','ś'=>'s','Ž'=>'Z','ż'=>'z','''=>'Z',''' '=>'z'' – kasimir 2014-06-25 08:22:36

+0

非常感謝 - 添加:) – BurninLeo 2014-06-25 19:24:55

+2

這是非常棒的,但是,小寫字母與上面的字母混合,不像鞋面。例如:d =>дd =>Д.這是錯誤的,只有D =>Д應該在這張桌子上,我想,對吧? – Kwaadpepper 2015-11-22 12:12:28

1

我知道,這個問題已經被問了很久很久以前......

我一直在尋找一個短期和優雅的解決方案,但有兩個原因找不到滿意:

首先,大多數現有解決方案都會用其他字符列表替換字符列表。不幸的是,它需要爲php腳本文件本身使用特定的編碼,這可能是不需要的。

其次,使用iconv似乎是一種很好的方法,但它不夠,因爲轉換字符的結果可能是一個或兩個字符,或致命異常。

所以我寫了小功能,做這項工作:

function replaceAccent($string, $replacement = '_') 
{ 
    $alnumPattern = '/^[a-zA-Z0-9 ]+$/'; 

    if (preg_match($alnumPattern, $string)) { 
     return $string; 
    } 

    $ret = array_map(
     function ($chr) use ($alnumPattern, $replacement) { 
      if (preg_match($alnumPattern, $chr)) { 
       return $chr; 
      } else { 
       $chr = @iconv('ISO-8859-1', 'ASCII//TRANSLIT', $chr); 
       if (strlen($chr) == 1) { 
        return $chr; 
       } elseif (strlen($chr) > 1) { 
        $ret = ''; 
        foreach (str_split($chr) as $char2) { 
         if (preg_match($alnumPattern, $char2)) { 
          $ret .= $char2; 
         } 
        } 
        return $ret; 
       } else { 
        // replace whatever iconv fail to convert by something else 
        return $replacement; 
       } 
      } 
     }, 
     str_split($string) 
    ); 

    return implode($ret); 
} 
8
protected $_convertTable = array(
    '&amp;' => 'and', '@' => 'at', '©' => 'c', '®' => 'r', 'À' => 'a', 
    'Á' => 'a', 'Â' => 'a', 'Ä' => 'a', 'Å' => 'a', 'Æ' => 'ae','Ç' => 'c', 
    'È' => 'e', 'É' => 'e', 'Ë' => 'e', 'Ì' => 'i', 'Í' => 'i', 'Î' => 'i', 
    'Ï' => 'i', 'Ò' => 'o', 'Ó' => 'o', 'Ô' => 'o', 'Õ' => 'o', 'Ö' => 'o', 
    'Ø' => 'o', 'Ù' => 'u', 'Ú' => 'u', 'Û' => 'u', 'Ü' => 'u', 'Ý' => 'y', 
    'ß' => 'ss','à' => 'a', 'á' => 'a', 'â' => 'a', 'ä' => 'a', 'å' => 'a', 
    'æ' => 'ae','ç' => 'c', 'è' => 'e', 'é' => 'e', 'ê' => 'e', 'ë' => 'e', 
    'ì' => 'i', 'í' => 'i', 'î' => 'i', 'ï' => 'i', 'ò' => 'o', 'ó' => 'o', 
    'ô' => 'o', 'õ' => 'o', 'ö' => 'o', 'ø' => 'o', 'ù' => 'u', 'ú' => 'u', 
    'û' => 'u', 'ü' => 'u', 'ý' => 'y', 'þ' => 'p', 'ÿ' => 'y', 'Ā' => 'a', 
    'ā' => 'a', 'Ă' => 'a', 'ă' => 'a', 'Ą' => 'a', 'ą' => 'a', 'Ć' => 'c', 
    'ć' => 'c', 'Ĉ' => 'c', 'ĉ' => 'c', 'Ċ' => 'c', 'ċ' => 'c', 'Č' => 'c', 
    'č' => 'c', 'Ď' => 'd', 'ď' => 'd', 'Đ' => 'd', 'đ' => 'd', 'Ē' => 'e', 
    'ē' => 'e', 'Ĕ' => 'e', 'ĕ' => 'e', 'Ė' => 'e', 'ė' => 'e', 'Ę' => 'e', 
    'ę' => 'e', 'Ě' => 'e', 'ě' => 'e', 'Ĝ' => 'g', 'ĝ' => 'g', 'Ğ' => 'g', 
    'ğ' => 'g', 'Ġ' => 'g', 'ġ' => 'g', 'Ģ' => 'g', 'ģ' => 'g', 'Ĥ' => 'h', 
    'ĥ' => 'h', 'Ħ' => 'h', 'ħ' => 'h', 'Ĩ' => 'i', 'ĩ' => 'i', 'Ī' => 'i', 
    'ī' => 'i', 'Ĭ' => 'i', 'ĭ' => 'i', 'Į' => 'i', 'į' => 'i', 'İ' => 'i', 
    'ı' => 'i', 'IJ' => 'ij','ij' => 'ij','Ĵ' => 'j', 'ĵ' => 'j', 'Ķ' => 'k', 
    'ķ' => 'k', 'ĸ' => 'k', 'Ĺ' => 'l', 'ĺ' => 'l', 'Ļ' => 'l', 'ļ' => 'l', 
    'Ľ' => 'l', 'ľ' => 'l', 'Ŀ' => 'l', 'ŀ' => 'l', 'Ł' => 'l', 'ł' => 'l', 
    'Ń' => 'n', 'ń' => 'n', 'Ņ' => 'n', 'ņ' => 'n', 'Ň' => 'n', 'ň' => 'n', 
    'ʼn' => 'n', 'Ŋ' => 'n', 'ŋ' => 'n', 'Ō' => 'o', 'ō' => 'o', 'Ŏ' => 'o', 
    'ŏ' => 'o', 'Ő' => 'o', 'ő' => 'o', 'Œ' => 'oe','œ' => 'oe','Ŕ' => 'r', 
    'ŕ' => 'r', 'Ŗ' => 'r', 'ŗ' => 'r', 'Ř' => 'r', 'ř' => 'r', 'Ś' => 's', 
    'ś' => 's', 'Ŝ' => 's', 'ŝ' => 's', 'Ş' => 's', 'ş' => 's', 'Š' => 's', 
    'š' => 's', 'Ţ' => 't', 'ţ' => 't', 'Ť' => 't', 'ť' => 't', 'Ŧ' => 't', 
    'ŧ' => 't', 'Ũ' => 'u', 'ũ' => 'u', 'Ū' => 'u', 'ū' => 'u', 'Ŭ' => 'u', 
    'ŭ' => 'u', 'Ů' => 'u', 'ů' => 'u', 'Ű' => 'u', 'ű' => 'u', 'Ų' => 'u', 
    'ų' => 'u', 'Ŵ' => 'w', 'ŵ' => 'w', 'Ŷ' => 'y', 'ŷ' => 'y', 'Ÿ' => 'y', 
    'Ź' => 'z', 'ź' => 'z', 'Ż' => 'z', 'ż' => 'z', 'Ž' => 'z', 'ž' => 'z', 
    'ſ' => 'z', 'Ə' => 'e', 'ƒ' => 'f', 'Ơ' => 'o', 'ơ' => 'o', 'Ư' => 'u', 
    'ư' => 'u', 'Ǎ' => 'a', 'ǎ' => 'a', 'Ǐ' => 'i', 'ǐ' => 'i', 'Ǒ' => 'o', 
    'ǒ' => 'o', 'Ǔ' => 'u', 'ǔ' => 'u', 'Ǖ' => 'u', 'ǖ' => 'u', 'Ǘ' => 'u', 
    'ǘ' => 'u', 'Ǚ' => 'u', 'ǚ' => 'u', 'Ǜ' => 'u', 'ǜ' => 'u', 'Ǻ' => 'a', 
    'ǻ' => 'a', 'Ǽ' => 'ae','ǽ' => 'ae','Ǿ' => 'o', 'ǿ' => 'o', 'ə' => 'e', 
    'Ё' => 'jo','Є' => 'e', 'І' => 'i', 'Ї' => 'i', 'А' => 'a', 'Б' => 'b', 
    'В' => 'v', 'Г' => 'g', 'Д' => 'd', 'Е' => 'e', 'Ж' => 'zh','З' => 'z', 
    'И' => 'i', 'Й' => 'j', 'К' => 'k', 'Л' => 'l', 'М' => 'm', 'Н' => 'n', 
    'О' => 'o', 'П' => 'p', 'Р' => 'r', 'С' => 's', 'Т' => 't', 'У' => 'u', 
    'Ф' => 'f', 'Х' => 'h', 'Ц' => 'c', 'Ч' => 'ch','Ш' => 'sh','Щ' => 'sch', 
    'Ъ' => '-', 'Ы' => 'y', 'Ь' => '-', 'Э' => 'je','Ю' => 'ju','Я' => 'ja', 
    'а' => 'a', 'б' => 'b', 'в' => 'v', 'г' => 'g', 'д' => 'd', 'е' => 'e', 
    'ж' => 'zh','з' => 'z', 'и' => 'i', 'й' => 'j', 'к' => 'k', 'л' => 'l', 
    'м' => 'm', 'н' => 'n', 'о' => 'o', 'п' => 'p', 'р' => 'r', 'с' => 's', 
    'т' => 't', 'у' => 'u', 'ф' => 'f', 'х' => 'h', 'ц' => 'c', 'ч' => 'ch', 
    'ш' => 'sh','щ' => 'sch','ъ' => '-','ы' => 'y', 'ь' => '-', 'э' => 'je', 
    'ю' => 'ju','я' => 'ja','ё' => 'jo','є' => 'e', 'і' => 'i', 'ї' => 'i', 
    'Ґ' => 'g', 'ґ' => 'g', 'א' => 'a', 'ב' => 'b', 'ג' => 'g', 'ד' => 'd', 
    'ה' => 'h', 'ו' => 'v', 'ז' => 'z', 'ח' => 'h', 'ט' => 't', 'י' => 'i', 
    'ך' => 'k', 'כ' => 'k', 'ל' => 'l', 'ם' => 'm', 'מ' => 'm', 'ן' => 'n', 
    'נ' => 'n', 'ס' => 's', 'ע' => 'e', 'ף' => 'p', 'פ' => 'p', 'ץ' => 'C', 
    'צ' => 'c', 'ק' => 'q', 'ר' => 'r', 'ש' => 'w', 'ת' => 't', '™' => 'tm', 
); 

從Magento的,使用它基本上是所有IM

+4

相當不錯。誰是magento? – BurninLeo 2014-09-19 08:30:42

+0

magento script ... – user3682119 2014-09-23 21:42:39

+1

這應該是所有網頁語言的內置函數,用於翻譯非有效的URL字符,同時保持可讀和SEO友好的URL,因爲替代方法目前是URL編碼,因此使得URL變得醜陋,長,並且不可讀。當然,它不能夠有效地支持許多亞洲語言,但是這涵蓋了大多數其他語言。值得注意的是,這個看起來醜陋的解決方案比使用iconv和/ TRANSLIT要好得多,這會給你留下很多問號,並且還必須知道輸入編碼的轉換。 – ekerner 2015-01-24 17:17:48

12

在PHP 5.4 intl擴展提供了一個名爲Transliterator新類。

我相信這是刪除變音符號的原因有兩個最好的辦法:

  1. Transliterator是基於ICU,讓你在使用ICU庫的表。 ICU是一個偉大的項目,全年開發提供全面的表格和功能。無論你想自己寫什麼桌子,它永遠不會像ICU那樣完整。

  2. 在UTF-8中,字符可以用不同的方式表示。例如,字符ñ可以保存爲單個(多字節)字符,也可以保存爲字符˜(多字節)和n的組合。除此之外,Unicode中的一些字符是同形異義詞:它們在具有不同碼點時看起來相同。由於這個原因,標準化字符串也很重要。

這裏的一個示例代碼,從an old answer of mine採取:

<?php 
$transliterator = Transliterator::createFromRules(':: NFD; :: [:Nonspacing Mark:] Remove; :: NFC;', Transliterator::FORWARD); 
$test = ['abcd', 'èe', '€', 'àòùìéëü', 'àòùìéëü', 'tiësto']; 
foreach($test as $e) { 
    $normalized = $transliterator->transliterate($e); 
    echo $e. ' --> '.$normalized."\n"; 
} 
?> 

結果:

abcd --> abcd 
èe --> ee 
€ --> € 
àòùìéëü --> aouieeu 
àòùìéëü --> aouieeu 
tiësto --> tiesto 

用於Transliterator類的第一個參數執行去除附加符號的以及正常化字符串。

+0

謝謝。但我嘗試你的代碼,「olivæ」仍然是「olivæ」而不是「olivae」 – 2017-01-11 15:05:55

+1

我使用transliterator_transliterate('Any-Latin; Latin-ASCII',「AæÜbérmenschpåhøyestenivå!ИялюблюPHP!fi」)解決我的問題 – 2017-01-11 15:18:57

+0

是'\ Transliterator :: createFromRules(':: Any-Latin; :: Latin-ASCII; :: NFD; :: [:Nonspacing Mark:] Remove; :: NFC;',\ Transliterator :: FORWARD )'將完成這項工作 – sboye 2017-11-13 16:48:18

4

一個更新的答案基於@BurninLeo的回答

function replace_spec_char($subject) { 
    $char_map = array(
     "ъ" => "-", "ь" => "-", "Ъ" => "-", "Ь" => "-", 
     "А" => "A", "Ă" => "A", "Ǎ" => "A", "Ą" => "A", "À" => "A", "Ã" => "A", "Á" => "A", "Æ" => "A", "Â" => "A", "Å" => "A", "Ǻ" => "A", "Ā" => "A", "א" => "A", 
     "Б" => "B", "ב" => "B", "Þ" => "B", 
     "Ĉ" => "C", "Ć" => "C", "Ç" => "C", "Ц" => "C", "צ" => "C", "Ċ" => "C", "Č" => "C", "©" => "C", "ץ" => "C", 
     "Д" => "D", "Ď" => "D", "Đ" => "D", "ד" => "D", "Ð" => "D", 
     "È" => "E", "Ę" => "E", "É" => "E", "Ë" => "E", "Ê" => "E", "Е" => "E", "Ē" => "E", "Ė" => "E", "Ě" => "E", "Ĕ" => "E", "Є" => "E", "Ə" => "E", "ע" => "E", 
     "Ф" => "F", "Ƒ" => "F", 
     "Ğ" => "G", "Ġ" => "G", "Ģ" => "G", "Ĝ" => "G", "Г" => "G", "ג" => "G", "Ґ" => "G", 
     "ח" => "H", "Ħ" => "H", "Х" => "H", "Ĥ" => "H", "ה" => "H", 
     "I" => "I", "Ï" => "I", "Î" => "I", "Í" => "I", "Ì" => "I", "Į" => "I", "Ĭ" => "I", "I" => "I", "И" => "I", "Ĩ" => "I", "Ǐ" => "I", "י" => "I", "Ї" => "I", "Ī" => "I", "І" => "I", 
     "Й" => "J", "Ĵ" => "J", 
     "ĸ" => "K", "כ" => "K", "Ķ" => "K", "К" => "K", "ך" => "K", 
     "Ł" => "L", "Ŀ" => "L", "Л" => "L", "Ļ" => "L", "Ĺ" => "L", "Ľ" => "L", "ל" => "L", 
     "מ" => "M", "М" => "M", "ם" => "M", 
     "Ñ" => "N", "Ń" => "N", "Н" => "N", "Ņ" => "N", "ן" => "N", "Ŋ" => "N", "נ" => "N", "ʼn" => "N", "Ň" => "N", 
     "Ø" => "O", "Ó" => "O", "Ò" => "O", "Ô" => "O", "Õ" => "O", "О" => "O", "Ő" => "O", "Ŏ" => "O", "Ō" => "O", "Ǿ" => "O", "Ǒ" => "O", "Ơ" => "O", 
     "פ" => "P", "ף" => "P", "П" => "P", 
     "ק" => "Q", 
     "Ŕ" => "R", "Ř" => "R", "Ŗ" => "R", "ר" => "R", "Р" => "R", "®" => "R", 
     "Ş" => "S", "Ś" => "S", "Ș" => "S", "Š" => "S", "С" => "S", "Ŝ" => "S", "ס" => "S", 
     "Т" => "T", "Ț" => "T", "ט" => "T", "Ŧ" => "T", "ת" => "T", "Ť" => "T", "Ţ" => "T", 
     "Ù" => "U", "Û" => "U", "Ú" => "U", "Ū" => "U", "У" => "U", "Ũ" => "U", "Ư" => "U", "Ǔ" => "U", "Ų" => "U", "Ŭ" => "U", "Ů" => "U", "Ű" => "U", "Ǖ" => "U", "Ǜ" => "U", "Ǚ" => "U", "Ǘ" => "U", 
     "В" => "V", "ו" => "V", 
     "Ý" => "Y", "Ы" => "Y", "Ŷ" => "Y", "Ÿ" => "Y", 
     "Ź" => "Z", "Ž" => "Z", "Ż" => "Z", "З" => "Z", "ז" => "Z", "S" => "Z", 
     "а" => "a", "ă" => "a", "ǎ" => "a", "ą" => "a", "à" => "a", "ã" => "a", "á" => "a", "æ" => "a", "â" => "a", "å" => "a", "ǻ" => "a", "ā" => "a", "א" => "a", 
     "б" => "b", "ב" => "b", "þ" => "b", 
     "ĉ" => "c", "ć" => "c", "ç" => "c", "ц" => "c", "צ" => "c", "ċ" => "c", "č" => "c", "©" => "c", "ץ" => "c", 
     "Ч" => "ch", "ч" => "ch", 
     "д" => "d", "ď" => "d", "đ" => "d", "ד" => "d", "ð" => "d", 
     "è" => "e", "ę" => "e", "é" => "e", "ë" => "e", "ê" => "e", "е" => "e", "ē" => "e", "ė" => "e", "ě" => "e", "ĕ" => "e", "є" => "e", "ə" => "e", "ע" => "e", 
     "ф" => "f", "ƒ" => "f", 
     "ğ" => "g", "ġ" => "g", "ģ" => "g", "ĝ" => "g", "г" => "g", "ג" => "g", "ґ" => "g", 
     "ח" => "h", "ħ" => "h", "х" => "h", "ĥ" => "h", "ה" => "h", 
     "i" => "i", "ï" => "i", "î" => "i", "í" => "i", "ì" => "i", "į" => "i", "ĭ" => "i", "ı" => "i", "и" => "i", "ĩ" => "i", "ǐ" => "i", "י" => "i", "ї" => "i", "ī" => "i", "і" => "i", 
     "й" => "j", "Й" => "j", "Ĵ" => "j", "ĵ" => "j", 
     "ĸ" => "k", "כ" => "k", "ķ" => "k", "к" => "k", "ך" => "k", 
     "ł" => "l", "ŀ" => "l", "л" => "l", "ļ" => "l", "ĺ" => "l", "ľ" => "l", "ל" => "l", 
     "מ" => "m", "м" => "m", "ם" => "m", 
     "ñ" => "n", "ń" => "n", "н" => "n", "ņ" => "n", "ן" => "n", "ŋ" => "n", "נ" => "n", "ʼn" => "n", "ň" => "n", 
     "ø" => "o", "ó" => "o", "ò" => "o", "ô" => "o", "õ" => "o", "о" => "o", "ő" => "o", "ŏ" => "o", "ō" => "o", "ǿ" => "o", "ǒ" => "o", "ơ" => "o", 
     "פ" => "p", "ף" => "p", "п" => "p", 
     "ק" => "q", 
     "ŕ" => "r", "ř" => "r", "ŗ" => "r", "ר" => "r", "р" => "r", "®" => "r", 
     "ş" => "s", "ś" => "s", "ș" => "s", "š" => "s", "с" => "s", "ŝ" => "s", "ס" => "s", 
     "т" => "t", "ț" => "t", "ט" => "t", "ŧ" => "t", "ת" => "t", "ť" => "t", "ţ" => "t", 
     "ù" => "u", "û" => "u", "ú" => "u", "ū" => "u", "у" => "u", "ũ" => "u", "ư" => "u", "ǔ" => "u", "ų" => "u", "ŭ" => "u", "ů" => "u", "ű" => "u", "ǖ" => "u", "ǜ" => "u", "ǚ" => "u", "ǘ" => "u", 
     "в" => "v", "ו" => "v", 
     "ý" => "y", "ы" => "y", "ŷ" => "y", "ÿ" => "y", 
     "ź" => "z", "ž" => "z", "ż" => "z", "з" => "z", "ז" => "z", "ſ" => "z", 
     "™" => "tm", 
     "@" => "at", 
     "Ä" => "ae", "Ǽ" => "ae", "ä" => "ae", "æ" => "ae", "ǽ" => "ae", 
     "ij" => "ij", "IJ" => "ij", 
     "я" => "ja", "Я" => "ja", 
     "Э" => "je", "э" => "je", 
     "ё" => "jo", "Ё" => "jo", 
     "ю" => "ju", "Ю" => "ju", 
     "œ" => "oe", "Œ" => "oe", "ö" => "oe", "Ö" => "oe", 
     "щ" => "sch", "Щ" => "sch", 
     "ш" => "sh", "Ш" => "sh", 
     "ß" => "ss", 
     "Ü" => "ue", 
     "Ж" => "zh", "ж" => "zh", 
    ); 
    return strtr($subject, $char_map); 
} 

$string = "Ħí ŧħə®ë, юßť å test!"; 
echo replace_spec_char($string); 

Ħí ŧħə®ë, юßť å test! => Hi there, jusst a test!

不混淆大寫和小寫字符除了較長字符(例如:ss,ch,sch),@ @©

此外,如果你想建立正則表達式匹配rega rdless特殊字符:

rss => '[rŕřŘŗŖרŔРр](?:[sșсŜšśסşСŝ][sșсŜšśסşСŝ]|[ß])'

一個VALA實現這一點:https://code.launchpad.net/~jeremy-munsch/synapse-project/ascii-smart/+merge/277477

這是基礎列表,你可以一起工作,用正則表達式替換(以崇高的文本)或小的腳本,你可以建立任何東西從這個數組中填充你的需求。

"-" => "ъьЪЬ", 
"A" => "АĂǍĄÀÃÁÆÂÅǺĀא", 
"B" => "БבÞ", 
"C" => "ĈĆÇЦצĊČ©ץ", 
"D" => "ДĎĐדÐ", 
"E" => "ÈĘÉËÊЕĒĖĚĔЄƏע", 
"F" => "ФƑ", 
"G" => "ĞĠĢĜГגҐ", 
"H" => "חĦХĤה", 
"I" => "IÏÎÍÌĮĬIИĨǏיЇĪІ", 
"J" => "ЙĴ", 
"K" => "ĸכĶКך", 
"L" => "ŁĿЛĻĹĽל", 
"M" => "מМם", 
"N" => "ÑŃНŅןŊנʼnŇ", 
"O" => "ØÓÒÔÕОŐŎŌǾǑƠ", 
"P" => "פףП", 
"Q" => "ק", 
"R" => "ŔŘŖרР®", 
"S" => "ŞŚȘŠСŜס", 
"T" => "ТȚטŦתŤŢ", 
"U" => "ÙÛÚŪУŨƯǓŲŬŮŰǕǛǙǗ", 
"V" => "Вו", 
"Y" => "ÝЫŶŸ", 
"Z" => "ŹŽŻЗזS", 
"a" => "аăǎąàãáæâåǻāא", 
"b" => "бבþ", 
"c" => "ĉćçцצċč©ץ", 
"ch" => "ч", 
"d" => "дďđדð", 
"e" => "èęéëêеēėěĕєəע", 
"f" => "фƒ", 
"g" => "ğġģĝгגґ", 
"h" => "חħхĥה", 
"i" => "iïîíìįĭıиĩǐיїīі", 
"j" => "йĵ", 
"k" => "ĸכķкך", 
"l" => "łŀлļĺľל", 
"m" => "מмם", 
"n" => "ñńнņןŋנʼnň", 
"o" => "øóòôõоőŏōǿǒơ", 
"p" => "פףп", 
"q" => "ק", 
"r" => "ŕřŗרр®", 
"s" => "şśșšсŝס", 
"t" => "тțטŧתťţ", 
"u" => "ùûúūуũưǔųŭůűǖǜǚǘ", 
"v" => "вו", 
"y" => "ýыŷÿ", 
"z" => "źžżзזſ", 
"tm" => "™", 
"at" => "@", 
"ae" => "ÄǼäæǽ", 
"ch" => "Чч", 
"ij" => "ijIJ", 
"j" => "йЙĴĵ", 
"ja" => "яЯ", 
"je" => "Ээ", 
"jo" => "ёЁ", 
"ju" => "юЮ", 
"oe" => "œŒöÖ", 
"sch" => "щЩ", 
"sh" => "шШ", 
"ss" => "ß", 
"tm" => "™", 
"ue" => "Ü", 
"zh" => "Жж" 
2

如果你有http://php.net/manual/en/book.intl.php可用,這將解決你的問題:

$string = "Éric Cantona"; 
$transliterator = Transliterator::createFromRules(':: NFD; :: [:Nonspacing Mark:] Remove; :: Lower(); :: NFC;', Transliterator::FORWARD); 
echo $normalized = $transliterator->transliterate($string); 
+0

如果你還想替換其他字符,如'æ',你可以使用'\ Transliterator :: createFromRules(':: Any-Latin; :: Latin-ASCII; :: NFD; :: [ :Nonspacing Mark:]刪除; :: NFC;',\ Transliterator :: FORWARD)'代替 – sboye 2017-11-13 16:51:15

4

這爲我工作:

<?php 
setlocale(LC_ALL, "en_US.utf8"); 
$val = iconv('UTF-8','ASCII//TRANSLIT',$val); 
?> 
2

我搜索和你的口音條紋想法很真棒和成本效益,但你的正則表達式錯誤地完成,並錯過了2個額外的參數。長話短說,正則表達式必須是:

$patterns[0] = '/[áâàåä]/ui'; 
$patterns[1] = '/[ðéêèë]/ui'; 
$patterns[2] = '/[íîìï]/ui'; 
$patterns[3] = '/[óôòøõö]/ui'; 
$patterns[4] = '/[úûùü]/ui'; 
$patterns[5] = '/æ/ui'; 
$patterns[6] = '/ç/ui'; 
$patterns[7] = '/ß/ui'; 
$replacements[0] = 'a'; 
$replacements[1] = 'e'; 
$replacements[2] = 'i'; 
$replacements[3] = 'o'; 
$replacements[4] = 'u'; 
$replacements[5] = 'ae'; 
$replacements[6] = 'c'; 
$replacements[7] = 'ss'; 

正如你可以看到的是非常相似,但最重要的是正則表達式的第二個斜槓後paramas。當一個regualr表達式是這樣/[someCoolRegex]/uiu指定它必須使用unicode和i指定不區分大小寫,我已經測試了我自己的並且使用這個中的ansewer,我必須說比使用strtr更具成本效益。

希望有人讀這個答案。

+0

工作得很好,可能是這篇文章的最佳答案。 – LamaDelRay 2018-01-29 09:23:34