2015-07-19 58 views
5

雖然我可以通過指定的std ::排序中的UTF-8語言環境,重音元音成功排序西班牙語單詞排序地圖(西班牙)重音單詞RCPP

// [[Rcpp::export]] 
std::vector<std::string> sort_words(std::vector<std::string> x) { 
    std::sort(x.begin(), x.end(), std::locale("en_US.UTF-8")); 
    return x; 
} 

/*** R 
words <- c("casa", "árbol", "zona", "árbol", "casa", "libro") 
sort_words(words) 
*/ 

returns (as expected): 
[1] "árbol" "árbol" "casa" "casa" "libro" "zona" 

我無法弄清楚如何做同樣與地圖:

// slightly modified version of tableC on http://adv-r.had.co.nz/Rcpp.html 
// [[Rcpp::export]] 
std::map<String, int> table_words(CharacterVector x) { 
    std::setlocale(LC_ALL, "en_US.UTF-8"); 
    // std::setlocale(LC_COLLATE, "en_US.UTF-8"); // also tried this instead of previous line 
    std::map<String, int> counts; 
    int n = x.size(); 
    for (int i = 0; i < n; i++) { 
    counts[x[i]]++; 
    } 
    return counts; 
} 

/*** R 
words <- c("casa", "árbol", "zona", "árbol", "casa", "libro") 
table_words(words) 
*/ 

returns: 
casa libro zona árbol 
    2  1  1  2 

but I want: 
árbol casa libro zona  
    2  2  1  1 

如何有table_words任何想法把「卡薩」之前重音「ARBOL」,與RCPP或R偶數背出來,用base::sort

另外,std::sort(..., std::locale("en_US.UTF-8"))我的Linux機器上只有一句話:gcc版本4.8.2(Ubuntu 4.8.2-19ubuntu1)。它無法在Mac 10.10.3上使用:Apple LLVM V6.1.0(clang-602.0.53)(基於LLVM 3.6.0svn)。我的Linux編譯器有什麼缺少我的Mac編譯器的線索?

這裏是我的劇本,我sessionInfo,兩臺機器:

// [[Rcpp::plugins(cpp11)]] 
#include <locale> 
#include <clocale> 
#include <Rcpp.h> 
using namespace Rcpp; 

// [[Rcpp::export]] 
std::vector<std::string> sort_words(std::vector<std::string> x) { 
    std::sort(x.begin(), x.end(), std::locale("en_US.UTF-8")); 
    return x; 
} 

// [[Rcpp::export]] 
std::map<String, int> table_words(CharacterVector x) { 
    // std::setlocale(LC_ALL, "en_US.UTF-8"); // tried this instead of next line 
    std::setlocale(LC_COLLATE, "en_US.UTF-8"); 
    std::map<String, int> counts; 
    int n = x.size(); 
    for (int i = 0; i < n; i++) { 
    counts[x[i]]++; 
    } 
    return counts; 
} 

/*** R 
words <- c("casa", "árbol", "zona", "árbol", "casa", "libro") 
sort_words(words) 
table_words(words) 
sort(table_words(words), decreasing = T) 
output_from_Rcpp <- table_words(words) 
sort(names(output_from_Rcpp)) 
*/ 

> words <- c("casa", "árbol", "zona", "árbol", "casa", "libro") 

> sort_words(words) 
[1] "árbol" "árbol" "casa" "casa" "libro" "zona" 

> table_words(words) 
casa libro zona árbol 
    2  1  1  2 

> sort(table_words(words), decreasing = T) 
casa árbol libro zona 
    2  2  1  1 

> output_from_Rcpp <- table_words(words) 

> sort(names(output_from_Rcpp)) 
[1] "árbol" "casa" "libro" "zona" 

sessionInfo on linux machine: 
R version 3.2.0 (2015-04-16) 
Platform: x86_64-pc-linux-gnu (64-bit) 
Running under: Ubuntu 14.04 LTS 

locale: 
[1] en_US.UTF-8 

attached base packages: 
[1] stats  graphics grDevices utils  datasets methods base  

loaded via a namespace (and not attached): 
[1] tools_3.2.0 Rcpp_0.11.6 

sessionInfo on Mac: 
R version 3.2.1 (2015-06-18) 
Platform: x86_64-apple-darwin13.4.0 (64-bit) 
Running under: OS X 10.10.3 (Yosemite) 

locale: 
[1] en_US.UTF-8 

attached base packages: 
[1] stats  graphics grDevices utils  datasets methods base  

other attached packages: 
[1] textcat_1.0-3 readr_0.1.1 rvest_0.2.0 

loaded via a namespace (and not attached): 
[1] httr_1.0.0 selectr_0.2-3 R6_2.1.0  magrittr_1.5 tools_3.2.1 curl_0.9.1 Rcpp_0.11.6 slam_0.1-32 stringi_0.5-5 
[10] tau_0.0-18 stringr_1.0.0 XML_3.98-1.3 
+1

請原諒我的無知,但沒有當'的std :: sort'採取了第三個參數,這是一個語言環境? 「std :: sort」的第三個參數應該是一個比較兩個項目的函數或函子,而不是語言環境。 – PaulMcKenzie

+1

@PaulMcKenzie:除了別的之外,語言環境是比較兩個項目的函子。 http://en.cppreference.com/w/cpp/locale/locale/operator() –

+1

不知道「Rcpp」的任何內容,但是你知道對於'std :: map',排序是類型本身以及您需要自定義比較器來啓用不同的順序? –

回答

1

它沒有意義的,在std::map適用std::sort,因爲地圖總是排序,顧名思義。該定義是由模板實例化的具體類型的一部分。 std::map具有第三個「隱藏」類型參數,用於對鍵進行排序的比較函數,對於鍵類型,其缺省值爲std::less。見http://en.cppreference.com/w/cpp/container/map

對於您的情況,您可以使用std::locale作爲比較類型,並將std::locale("en-US")(或任何適合您的系統)傳遞給構造函數。

這裏是一個例子。它使用C++ 11,但您可以在C++ 03中輕鬆使用相同的解決方案。

#include <map> 
#include <iostream> 
#include <string> 
#include <locale> 
#include <exception> 

using Map = std::map<std::string, int, std::locale>; 

int main() 
{ 
    try 
    { 
     Map map(std::locale("en-US")); 
     map["casa"] = 1; 
     map["árbol"] = 2; 
     map["zona"] = 3; 
     map["árbol"] = 4; 
     map["casa"] = 5; 
     map["libro"] = 6; 

     for (auto const& map_entry : map) 
     { 
      std::cout << map_entry.first << " -> " << map_entry.second << "\n"; 
     } 
    } 
    catch (std::exception const& exc) 
    { 
     std::cerr << exc.what() << "\n"; 
    } 
} 

輸出:

árbol -> 4 
casa -> 5 
libro -> 6 
zona -> 3 

當然,你必須知道的事實,std::locale是高度依賴於實現。用Boost.Locale可能會更好。

另一個問題是,這個解決方案可能看起來很混亂,因爲std::locale並不是許多程序員會與比較函數關聯的東西。這幾乎有點太聰明瞭。

因此,一個可能更具可讀性替代:

#include <map> 
#include <iostream> 
#include <string> 
#include <locale> 
#include <exception> 

struct ComparisonUsingLocale 
{ 
    std::locale locale{ "en-US" }; 

    bool operator()(std::string const& lhs, std::string const& rhs) const 
    { 
     return locale(lhs, rhs); 
    } 
}; 

using Map = std::map<std::string, int, ComparisonUsingLocale>; 

int main() 
{ 
    try 
    { 
     Map map; 
     map["casa"] = 1; 
     map["árbol"] = 2; 
     map["zona"] = 3; 
     map["árbol"] = 4; 
     map["casa"] = 5; 
     map["libro"] = 6; 

     for (auto const& map_entry : map) 
     { 
      std::cout << map_entry.first << " -> " << map_entry.second << "\n"; 
     } 
    } 
    catch (std::exception const& exc) 
    { 
     std::cerr << exc.what() << "\n"; 
    } 
} 
+0

感謝您的幫助,但仍然沒有運氣: Earls-MBP:C++ earlbrown $'g ++ -std = C++ 11 order_with_accents.cpp -o go' Earls-MBP:C++ earlbrown $'。/ go' 'collat​​e_byname :: collat​​e_byname未能爲en-US構建' Earls-MBP:C++ earlbrown $'g ++ -v' '配置爲:--prefix =/Applications/Xcode.app/Contents/Developer/usr - 與-GXX-包括-DIR =的/ usr /包括/ C++/4.2.1 蘋果LLVM版本6.1.0(鐺-602.0.53)(基於LLVM 3.6.0svn) 目標:x86_64的-蘋果darwin14.4.0 線程模型:posix' –