C++提前轉換的UTF-8字符串ICU的StringPiece

第一次張貼在這裏，所以道歉，如果我的標題/格式化/標籤是他們沒有按照正確的方式。C++提前轉換的UTF-8字符串ICU的StringPiece

我想創建一個C++ Windows控制檯應用程序的功能，這將從std::wstring用戶輸入刪除變音符號。要做到這一點，我使用的是從this question的幫助創建的代碼，以及我的wstring轉換爲UTF-8字符串如下：

std::string test= wstring_to_utf8 (input); 

std::string wstring_to_utf8 (const std::wstring& str){ 
std::wstring_convert<std::codecvt_utf8<wchar_t>> myconv; 
return myconv.to_bytes(str); 
} 

std::string output= desaxUTF8(test);

與desaxUTF8（...）是：

#include <unicode/utypes.h> 
#include <unicode/unistr.h> 
#include <unicode/ustream.h> 
#include <unicode/translit.h> 
#include <unicode/stringpiece.h> 

std::string desaxUTF8(const std::string& str) { 

StringPiece s(str); 
UnicodeString source = UnicodeString::fromUTF8(s); 
//... 
return result; 
}

這裏是我遇到問題的地方。 StringPiece s未正確接收來自string str的值，而是被設置爲不正確的值。

但如果我是用硬編碼值替換StringPiece s(str);，說StringPiece s("abcš");，它完美的罰款。

使用VS2015調試器，上StringPiece s值的用戶輸入，abcš是不正確的0x0028cdc0 "H\t„"，而一個硬編碼abcš值是正確的0x00b483d4 "abcĹˇ"

我在做什麼錯的，什麼是解決此問題的最佳方法是？我已經嘗試了this thread推薦的解決方案。

我花了最後兩天，試圖找到一個解決辦法都無濟於事，所以任何幫助將不勝感激。

預先感謝您。

帖子答案編輯：對於任何人有興趣，這裏是工作的代碼，進行大規模的感謝史蒂芬R.盧米斯爲促成這件事情;

std::wstring Menu::removeDiacritis(const std::wstring &input) { 

UnicodeString source(FALSE, input.data(), input.length()); 
UErrorCode status = U_ZERO_ERROR; 
    Transliterator *accentsConverter = Transliterator::createInstance(
    "NFD; [:M:] Remove; NFC", UTRANS_FORWARD, status); 
accentsConverter->transliterate(source); 

std::wstring output(source.getBuffer(), source.length()); 
return output; 
}

來源

2016-01-12 Peter

你想用StringPiece直接在混合中實現什麼？ UnicodeString u = UnicodeString :: fromUTF8（str）應該工作得很好，假設str是包含有效UTF-8的std :: string。 – NuSkooler

我嘗試了你的建議，它會產生相同的錯誤行爲。雖然，UnicodeString u = UnicodeString :: fromUTF8（「abcš」）確實有效，所以看起來StringPiece確實是不必要的。但是，它並不能解決我的問題，因爲它仍然不會在UnicodeString中使用正確的字符串str值。 – Peter

我認爲在這一點上，我們知道來自wstring_to_utf8（）的數據必須是不好的。你有什麼在你的std :: wstring輸入？ codecvt_utf8用於UTF-8到/來自UTF-32。既然你在Windows上，我猜你的std :: wstring有UTF-16數據，你需要codecvt_utf8_utf16。 – NuSkooler

@NuSkooler（嗨！）當然是現貨。在任何情況下，試試這個轉換UnicodeString和std::wstring IFF之間std::wstring實際上是UTF-16。（未測試）

std::wstring doSomething(const std::wstring &input) { #if(sizeof(wchar_t) != sizeof(UChar)) #error no idea what (typically underspecified) wchar_t actually is. #else // source is a read-only alias to the input data const UnicodeString source(FALSE, input.data(), input.length()); // DO SOMETHING with the data UnicodeString target = SOME_ACTUAL_FUNCTION(source); // <<<< Put your actual code here // construct an output wstring std::wstring output(target.getBuffer(), target.length()); // return it return output; #endif }

來源

2016-01-12 22:56:54

非常感謝！經過一些小調整後，這對我來說很有效，還有額外的好處！ – Peter

C++提前轉換的UTF-8字符串ICU的StringPiece

回答

相關問題