恢復運行時unicode字符串

我正在構建一個應用程序，通過tcp接收帶有編碼unicode的運行時字符串，示例字符串爲「\ u7cfb \ u8eca \ u4e21 \ uff1a \ u6771 \ u5317 ...」。我有以下內容，但不幸的是，我只能在編譯時受益於：不完整的通用字符名稱\ u，因爲它在編譯時期待4個十六進制字符。恢復運行時unicode字符串

QString restoreUnicode(QString strText) 
    { 
     QRegExp rx("\\\\u([0-9a-z]){4}"); 
     return strText.replace(rx, QString::fromUtf8("\u\\1")); 
    }

我正在尋找在運行時的解決方案，我可以預見我打破了這些字符串後做一些操作，以十六進制的轉換「\ U」分隔符成基10，然後將它們傳遞到的構造一個QChar，但我正在尋找更好的方法，因爲我非常關心這種方法所帶來的時間複雜性，而不是專家。

有沒有人有任何解決方案或提示。

來源

2011-11-18 Will

是不是'fromUtf8（「\\ u \\ 1」）'wor K +你的想法和這種嘗試有同樣的問題：'const char razy [] =「lass」;瘋狂的Foo {int a; bool b; };' –

爲什麼不使用'QDataStream'來編碼/解碼通過套接字的數據？ –

我沒有對服務器的控制權，只是與第三方數據流一起工作，因爲它是ascii和小型場合嵌入式unicode的混合體。我已經提出了一個很好的解決方案，當您的問題計時器的答案在本網站上到期時，將在6個小時內發佈。 – Will

對於封閉，誰在將來遇到這個線程，這是我最初的解決方案優化範圍之前這些變量並不是它的粉絲，但它在unicode和/或ascii在我無法控制的流中（僅用於客戶端）具有不可預知的性質，而Unicode的存在性很低，所以很好處理它醜陋的\ u123等

QString restoreUnicode(QString strText) 
{ 
    QRegExp rxUnicode("\\\\u([0-9a-z]){4}"); 

    bool bSuccessFlag; 
    int iSafetyOffset = 0; 
    int iNeedle = strText.indexOf(rxUnicode, iSafetyOffset); 

    while (iNeedle != -1) 
    { 
     QChar cCodePoint(strText.mid(iNeedle + 2, 4).toInt(&bSuccessFlag, 16)); 

     if (bSuccessFlag) 
      strText = strText.replace(strText.mid(iNeedle, 6), QString(cCodePoint)); 
     else 
      iSafetyOffset = iNeedle + 1; // hop over non code point to avoid lock 

     iNeedle = strText.indexOf(rxUnicode, iSafetyOffset); 
    } 

    return strText; 
}

來源

2011-11-18 23:08:47 Will

你應該自己解碼字符串。就拿Unicode的條目（rx.indexIn(strText)），解析它（並替換原始字符串\\uXXXX與(wchar_t)result。

來源

2011-11-18 14:45:19 Vlad

我已經做了類似的工作，Unicode的存在性很低，所以我希望我的解決方案不會帶來令人不安的CPU使用率。我會在網站讓我的6個小時內發佈解決方案。 – Will

#include <assert.h> 
#include <iostream> 
#include <string> 
#include <sstream> 
#include <locale> 
#include <codecvt>   // C++11 
using namespace std; 

int main() 
{ 
    char const data[] = "\\u7cfb\\u8eca\\u4e21\\uff1a\\u6771\\u5317"; 

    istringstream stream(data); 

    wstring  ws; 
    int   code; 
    char  slashCh, uCh; 
    while(stream >> slashCh >> uCh >> hex >> code) 
    { 
     assert(slashCh == '\\' && uCh == 'u'); 
     ws += wchar_t(code); 
    } 

    cout << "Unicode code points:" << endl; 
    for(auto it = ws.begin(); it != ws.end(); ++it) 
    { 
     cout << hex << 0 + *it << endl; 
    } 
    cout << endl; 

    // The following is C++11 specific. 
    cout << "UTF-8 encoding:" << endl; 
    wstring_convert< codecvt_utf8<wchar_t> > converter; 
    string const bytes = converter.to_bytes(ws); 
    for(auto it = bytes.begin(); it != bytes.end(); ++it) 
    { 
     cout << hex << 0 + (unsigned char)*it << ' '; 
    } 
    cout << endl; 
}

來源

2011-11-18 14:56:15

該流不是唯一的unicode，因爲在正在進行的流中可能存在非unicode條目，但是謝謝。 – Will

恢復運行時unicode字符串

回答

相關問題