計算ASCII和Unicode混合字符串中的字符數

strlen($username);

用戶名可以攜帶ASCII，Unicode或兩者。計算ASCII和Unicode混合字符串中的字符數

實施例：

Jam123（ASCII） - 6個字符
ابت（Unicode）的 - 3個字符，但strlen的返回6個字節爲Unicode是每字符2個字節。
果醬ت（Unicode和ASCII） - 5個字符（3 ASCII和2 Unicode的，即使我只有一個Unicode字符）

用戶名在所有情況下都不能超出25個字符，不應小於4個字符。

我的主要問題是混合Unicode和ASCII在一起時，我怎能算軌如此條件語句可以deicde用戶名是否不超過25個且不少於4

if(strlen($username) <= 25 && !(strlen($username) < 4))

3在Unicode字符將被計爲6個字節引起麻煩，因爲它允許用戶以具有3個Unicode字符的用戶名當字符應該是4

編號最小總是會在ASCII

來源

2011-09-03 user311509

所有ASCII都是Unicode。並非所有的Unicode都是ASCII。 – tchrist

@tchrist所有的ASCII都是** UTF-8 **。並非所有的** UTF-8 **都是ASCII碼。 Unicode既不是。 – deceze

@user這可能是你一個良好的閱讀：什麼每個程序員絕對，肯定需要知道編碼和字符集進行工作，文字]（http://kunststube.net/encoding/） – deceze

使用mb_strlen()。它處理unicode字符。

例子：

mb_strlen("Jamت", "UTF-8"); // 4

來源

2011-09-03 21:41:12 arnaud576875

Surprinsly，它的工作！我之前嘗試過這個解決方案，但沒有工作......我想我有一個錯字或... ...無論如何，小修復mb_strlen（）...謝謝 – user311509

您可以使用，你選擇編碼mb_strlen。

http://sandbox.phpcode.eu/g/3a144/1

<?php 
echo mb_strlen('ابت', 'UTF8'); // returns 3

來源

2011-09-03 21:42:26 genesis

函數來計算的UNICODE句子/串詞：

function mb_count_words($string) 
{ 
    preg_match_all('/[\pL\pN\pPd]+/u', $string, $matches); return count($matches[0]); 
}

或

function mb_count_words($string, $format = 0, $charlist = '[]') { 
    $string=trim($string); 
    if(empty($string)) 
     $words = array(); 
    else 
     $words = preg_split('~[^\p{L}\p{N}\']+~u',$string); 
    switch ($format) { 
     case 0: 
      return count($words); 
      break; 
     case 1: 
     case 2: 
      return $words; 
      break; 
     default: 
      return $words; 
      break; 
    } 
}

然後執行：

echo mb_count_words("chào buổi sáng");

來源

2013-12-24 15:20:58

計算ASCII和Unicode混合字符串中的字符數

回答

相關問題