迭代通過C++中的UTF-8字符串11

我想遍歷一個UTF-8字符串。這個問題據我瞭解，UTF-8字符的長度是可變的，所以我不能只是迭代字符，但我必須使用某種轉換。我相信在現代C++中有這樣的功能，但我不知道它是什麼。迭代通過C++中的UTF-8字符串11

#include <iostream> 
#include <string> 

int main() 
{ 
    std::string text = u8"řabcdě"; 
    std::cout << text << std::endl; // Prints fine 
    std::cout << "First letter is: " << text.at(0) << text.at(1) << std::endl; // Again fine. So 'ř' is a 2 byte letter? 

    for(auto it = text.begin(); it < text.end(); it++) 
    { 
    // Obviously wrong. Outputs only ascii part of the text (a, b, c, d) correctly 
    std::cout << "Iterating: " << *it << std::endl; 
    } 
}

編譯時clang++ -std=c++11 -stdlib=libc++ test.cpp

從我讀過wchar_t和wstring不宜使用。

來源

2014-09-27 Jan Šimek

沒有「UTF-8字符」這樣的東西。在你熟悉這個主題之前，跳到編寫代碼是令人沮喪和不合時宜的。 – 2014-09-27 11:21:27

你在一些Unixoid或Windows上？你想要密碼單元，密碼或字母嗎？（字符是可笑的上下文依賴（並且甚至上下文可能不足以決定），並且在Windows上有額外的傷害） – Deduplicator 2014-09-27 11:21:37

你可能想看看[這裏]（http://en.cppreference.com/W/CPP /區域/ wstring_convert/from_bytes）。記住它在gcc中不起作用，他們還沒有實現這部分標準，但是在clang/libC++中工作，並且應該與VS2013 IIRC一起工作。 – 2014-09-27 11:38:40

至於中午。建議我用std::wstring_convert：

#include <codecvt> 
#include <locale> 
#include <iostream> 
#include <string> 

int main() 
{ 
    std::u32string input = U"řabcdě"; 

    std::wstring_convert<std::codecvt_utf8<char32_t>, char32_t> converter; 

    for(char32_t c : input) 
    { 
    std::cout << converter.to_bytes(c) << std::endl; 
    } 
}

也許我應該更明確地說，我想知道，如果這是可以做到在C++ 11不使用任何第三方庫像ICU的問題指定或UTF8-CPP。

來源

2014-09-28 09:57:06

你使用什麼版本的g ++？它可能是C++的一部分14 – Splash 2015-11-09 03:24:11

我使用clang：Apple LLVM版本7.0.0（clang-700.0.72），但這都是C++ 11。你可以查看http://en.cppreference.com – 2015-11-09 06:19:11

我在http://en.cppreference.com/w/cpp/locale/codecvt_utf8上運行，選擇了4.9版本的C++ 11，編譯。你可以看一下嗎？ – Splash 2015-11-09 17:27:37

迭代通過C++中的UTF-8字符串11

回答

相關問題