字符串文字basic_string <unsigned char>

談到國際化& Unicode，我是一個白癡的美國程序員。這筆交易。字符串文字basic_string <unsigned char>

#include <string> 
using namespace std; 

typedef basic_string<unsigned char> ustring; 

int main() 
{ 
    static const ustring my_str = "Hello, UTF-8!"; // <== error here 
    return 0; 
}

這發出一個不意外的投訴：

cannot convert from 'const char [14]' to 'std::basic_string<_Elem>'

也許今天我有咖啡的錯誤部分。我該如何解決？我可以保持基本結構：

ustring something = {insert magic incantation here};

？

來源

2010-09-30 John Dibling

不回答你的問題，但閱讀這篇文章在i18n：http：//www.joelonsoftware.com/articles/Unicode.html – Starkey 2010-09-30 20:36:41

看過它，但thx – 2010-09-30 20:39:34

你可能需要提供你自己的'char_traits '專業化。 AFAIK，''只提供'char'和'wchar_t'的專門化。 – Praetorian 2010-09-30 20:44:08

窄字符串文字被定義爲const char，而且沒有無符號的字符串文字[1]，所以你必須投：

ustring s = reinterpret_cast<const unsigned char*>("Hello, UTF-8");

當然，你可以把那個長長的東西變成一個內聯功能：

inline const unsigned char *uc_str(const char *s){ 
    return reinterpret_cast<const unsigned char*>(s); 
} 

ustring s = uc_str("Hello, UTF-8");

或者你也可以只使用basic_string<char>並擺脫它的你處理UTF-8的99.9％。

[1]除非char是無符號的，但不管它是否是實現定義的，等等，等等。

來源

2010-09-30 20:50:59

我*想*這是答案... – 2010-09-30 20:56:05

@Steve，I知道這是舊的，但我很好奇，什麼時候basic_string 不適用於存儲UTF-8編碼的字符串？它只是存儲一個從未失敗過的字節序列。有沒有我不知道的角落案例？ – Matthew 2017-09-13 19:57:32

對不同的編碼使用不同的字符類型具有的優點是，編譯器會在您將它們混淆時吠叫你。缺點是，你必須手動轉換。

一些輔助函數救援：

inline ustring convert(const std::string& sys_enc) { 
    return ustring(sys_enc.begin(), sys_enc.end()); 
} 

template< std::size_t N > 
inline ustring convert(const char (&array)[N]) { 
    return ustring(array, array+N); 
} 

inline ustring convert(const char* pstr) { 
    return ustring(reinterpret_cast<const ustring::value_type*>(pstr)); 
}

當然，所有這些失敗默默致命時轉換的字符串包含ASCII其他任何東西。

來源

2010-09-30 22:52:33 sbi

不知怎的，我不能使用'convert'的第三個重載。我得到以下編譯錯誤：'錯誤：從'const char *'轉換爲'std :: __ cxx11 :: basic_string :: value_type {aka unsigned char}'失去精度[-fpermissive] return ustring（reinterpret_cast （pstr））;'。 [coliru鏈接]（http://coliru.stacked-crooked.com/a/66b1d6c08a1ad63e） – Patryk 2016-02-22 15:25:45

@Patryk：我相信我已經解決了這個問題。對不起，我很久以前就錯了。 – sbi 2016-02-22 15:28:55

這就是我們爲此所做的:) – Patryk 2016-02-22 15:33:20

讓您的生活更輕鬆，使用UTF-8字符串庫（如http://utfcpp.sourceforge.net/），或者使用std :: wstring並使用UTF-16。您可能有興趣從堆棧溢出的另一個問題的討論：C++ strings: UTF-8 or 16-bit encoding?

來源

2010-09-30 23:12:57 Matthew

不能使用UTF-16。傳入文件是UTF-8。 – 2010-10-01 14:55:06

我想下一個問題是，在加載之後，您需要如何處理文件中的數據？將其轉換爲UTF-16可能是有意義的，或者將它保留爲UTF-8可能更容易和更高效。 – Matthew 2010-10-01 15:16:21

與UTF-8相比，UTF-16並沒有真正的優勢。實際上，我能想到的只有兩個是A）它是Windows的原生Unicode編碼，所以當你在做Windows時，它使它更容易，並且B）當你使用很多那些（CJK）字符時在UTF-8中需要三個字節，但在UTF-16中只需要兩個字節，那麼UTF-16需要較少的內存。 – sbi 2010-10-01 20:12:35

字符串文字basic_string <unsigned char>

回答

相關問題