2015-04-03 26 views

回答

1

Windows使用UTF-16作爲其本機字符串類型。 UTF-16處理的代碼點高達U+10FFFF,使用替代對U+FFFF以上的代碼點進行編碼。

Windows有沒有UTF-32概念,所以你必須要麼:

  1. 如果您正在使用C++ 11或更高版本,它具有天然的std::u16stringstd::u32string類型和std::codecvt類的數據轉換UTF-8,UTF-16和UTF-32之間。

    #include <string> 
    #include <locale> 
    
    std::u16string Utf32ToUtf16(const u32string &codepoints) 
    { 
        std::wstring_convert< 
         std::codecvt_utf16<char32_t, 0x10ffff, std::little_endian> 
         char32_t> conv; 
        std::string bytes = conv.to_bytes(codepoints); 
        return std::u16string(reinterpret_cast<char16_t*>(bytes.c_str()), bytes.length()/sizeof(char16_t)); 
    } 
    
  2. 如果您使用的是較早的C/C++版本中,你將不得不從UTF-32轉換爲UTF-16手動:

    // on Windows, wchar_t is 2 bytes, suitable for UTF-16 
    std::wstring Utf32ToUtf16(const std::vector<uint32_t> &codepoints) 
    { 
        std::wstring result; 
        int len = 0; 
    
        for (std::vector<uint32_t>::iterator iter = codepoints.begin(); iter != codepoints.end(); ++iter) 
        { 
         uint32_t cp = *iter; 
         if (cp < 0x10000) { 
          ++len; 
         } 
         else if (cp <= 0x10FFFF) { 
          len += 2; 
         } 
         else { 
          // invalid code_point, do something ! 
          ++len; 
         } 
        } 
    
        if (len > 0) 
        { 
         result.resize(len); 
         len = 0; 
    
         for (std::vector<uint32_t>::iterator iter = codepoints.begin(); iter != codepoints.end(); ++iter) 
         { 
          uint32_t cp = *iter; 
          if (cp < 0x10000) { 
           result[len++] = static_cast<wchar_t>(cp); 
          } 
          else if (cp <= 0x10FFFF) { 
           cp -= 0x10000; 
           result[len++] = static_cast<wchar_t>((cp >> 10) + 0xD800); 
           result[len++] = static_cast<wchar_t>((cp & 0x3FF) + 0xDC00); 
          } 
          else { 
           result[len++] = static_cast<wchar_t>(0xFFFD); 
          } 
         } 
        } 
    
        return result; 
    } 
    
  3. 使用第三方庫,如libiconvICU

相關問題