插入窄字符串到std :: basic_ostream <wchar_t>

根據cppref，有一個過載operator <<爲std::basic_ostream<wchar_t>接受const char*。看起來轉換操作只是將每個char變爲wchar_t。也就是說，轉換（插入）的寬字符數等於窄字符數。所以這裏出現了一個問題。窄字符串可能是國際字符的編碼，例如使用GB2312的中文字符。進一步假設sizeof(wchar_t)是2並且使用UTF16編碼。那麼這個樸素的字符轉換方法應該如何工作呢？插入窄字符串到std :: basic_ostream <wchar_t>

來源

2015-10-02 Lingxi

我會說它*不會*工作。如果您需要在不同的編碼和字符寬度之間進行轉換，您應該查看處理它的庫，如[ICU]（http://site.icu-project.org/）。 –

@JoachimPileborg那麼寬字符日誌如何在Boost.Log中工作？請參閱http://www.boost.org/doc/libs/1_59_0/libs/log/doc/html/log/tutorial/wide_char.html – Lingxi

對於Boost日誌我什麼都不能說，但它可能只是做適當的轉換某處？ –

我剛剛在Visual Studio 2015中檢查過，你是對的。 char只擴大到wchar_t s沒有任何轉換。在我看來，你必須自己將窄字符串轉換爲寬字符串。有幾種方法可以做到這一點，其中一些已經提出。

在這裏，我建議你可以使用純C++設施做到這一點，假設你的C++編譯器和標準庫是完全足夠的（Visual Studio中，或GCC在Linux（和只有在那裏））：

void clear_mbstate (std::mbstate_t & mbs); 

void 
towstring_internal (std::wstring & outstr, const char * src, std::size_t size, 
    std::locale const & loc) 
{ 
    if (size == 0) 
    { 
     outstr.clear(); 
     return; 
    } 

    typedef std::codecvt<wchar_t, char, std::mbstate_t> CodeCvt; 
    const CodeCvt & cdcvt = std::use_facet<CodeCvt>(loc); 
    std::mbstate_t state; 
    clear_mbstate (state); 

    char const * from_first = src; 
    std::size_t const from_size = size; 
    char const * const from_last = from_first + from_size; 
    char const * from_next = from_first; 

    std::vector<wchar_t> dest (from_size); 

    wchar_t * to_first = &dest.front(); 
    std::size_t to_size = dest.size(); 
    wchar_t * to_last = to_first + to_size; 
    wchar_t * to_next = to_first; 

    CodeCvt::result result; 
    std::size_t converted = 0; 
    while (true) 
    { 
     result = cdcvt.in (
      state, from_first, from_last, 
      from_next, to_first, to_last, 
      to_next); 
     // XXX: Even if only half of the input has been converted the 
     // in() method returns CodeCvt::ok. I think it should return 
     // CodeCvt::partial. 
     if ((result == CodeCvt::partial || result == CodeCvt::ok) 
      && from_next != from_last) 
     { 
      to_size = dest.size() * 2; 
      dest.resize (to_size); 
      converted = to_next - to_first; 
      to_first = &dest.front(); 
      to_last = to_first + to_size; 
      to_next = to_first + converted; 
      continue; 
     } 
     else if (result == CodeCvt::ok && from_next == from_last) 
      break; 
     else if (result == CodeCvt::error 
      && to_next != to_last && from_next != from_last) 
     { 
      clear_mbstate (state); 
      ++from_next; 
      from_first = from_next; 
      *to_next = L'?'; 
      ++to_next; 
      to_first = to_next; 
     } 
     else 
      break; 
    } 
    converted = to_next - &dest[0]; 

    outstr.assign (dest.begin(), dest.begin() + converted); 
} 

void 
clear_mbstate (std::mbstate_t & mbs) 
{ 
    // Initialize/clear mbstate_t type. 
    // XXX: This is just a hack that works. The shape of mbstate_t varies 
    // from single unsigned to char[128]. Without some sort of initialization 
    // the codecvt::in/out methods randomly fail because the initial state is 
    // random/invalid. 
    std::memset (&mbs, 0, sizeof (std::mbstate_t)); 
}

此功能是log4cplus庫的一部分，它的工作原理。它使用codecvt方面進行轉換。你必須給它適當的設置locale。

Visual Studio可能會出現問題，可以爲您適當地設置GB2312的區域設置。您可能必須使用_setmbcp()才能正常工作。詳情請參閱「double byte character sequence conversion issue in Visual Studio 2015」。

來源

2015-10-22 11:51:08 wilx

插入窄字符串到std :: basic_ostream <wchar_t>

回答

相關問題