C++從UTF-8轉換爲用iconv

我有運行下列一個C++ Linux應用wstring的：C++從UTF-8轉換爲用iconv

int main() 
{ 
    using namespace std; 
    char str[] = "¡Hola!"; 

    wchar_t wstr[50]; 

    size_t rc; 

    memset(wstr, 0, sizeof(wstr)); 

    rc = mbstowcs(wstr, str, 50); 

    cout << "mbstowcs results: "; 
    cout << "rc = " << rc << endl; 
    cout << "str:" << str << endl; 
    wcout << L"wstr:" << wstr << endl; 
    setlocale(LC_CTYPE,""); 
    iconv_t cd = iconv_open("WCHAR_T", "UTF-8"); 
    cout << "iconv_open errno = "<< errno << endl; 

    char *s = str; 
    char *t = (char *)wstr; 
    size_t s1 = strlen(str); 
    size_t s2 = 50; 

    rc = iconv(cd, &s, &s1, &t, &s2); 

    cout << "iconv results: "; 
    cout << "rc = " << rc << endl; 
    cout << "str:" << str << endl; 
    wcout << L"wstr:" << wstr << endl; 

}

欲一個UTF-8字符向量轉換爲wstring的，但上面的代碼返回該結果：

mbstowcs results: rc = 18446744073709551615 
    str:¡Hola! 
    wstr: 
    iconv_open errno = 2 
    iconv results: rc = 0 
    str:¡Hola! 
    wstr:�Hola!

iconv結果將第一個字符轉換爲另一個字符。

注意：如果我替換UCS-4 -INTERNAL中的WCHAR_T，則wstr不包含任何內容。

有幫助嗎？

謝謝！

來源

2011-03-30 gln

只是爲了便於使用，不要假設wchar_t是32位（足以容納UCS-4） – ognian 2011-03-30 07:09:08

就像另外一個注意事項：當引用一個字符串常量（即你的'str'）時，你應該將它定義爲'const'不會偶然改變它。 – Mario 2011-03-30 09:46:57

是否可以使用boost？

http://www.boost.org/doc/libs/1_46_1/libs/serialization/doc/codecvt.html

來源

2011-03-30 08:51:22 Naszta

不看的iconv文檔（從未使用過它爲止），我期望你的輸入（char str[] = "¡Hola!";）不被編碼爲多字節字符串 - 它更可能使用一個簡單的ANSI字符串本地/當前代碼頁代表'¡'。或者換句話說：在你現有的字符串中（使用const char[]）'¡'存儲在一個單獨的字節中，其值高於127.然而，mbstowcs()會希望它使用2個字節來表示一個合適的'¡'（現在不檢查）和您的'¡'使用的價值甚至可能不是預期/允許的。

我希望在那裏發生錯誤，因爲mbcstowcs()應該返回已轉換字符串中的字符數 - 但「18446744073709551615」太長了。如果這是真的，那麼在使用正確的文本定義自己的寬字符串時，應該能夠正確使用iconv，並使用該字符串（wchar_t wstr[] = L"¡Hola!";）。

來源

2011-03-30 09:43:06 Mario

C++從UTF-8轉換爲用iconv

回答

相關問題