2016-11-07 21 views
2

我正在從SQL數據庫中讀取一個表格(英語,阿拉伯語和其他語言)的城市列。所有編碼類型都是未知的,當我嘗試使用編碼強制它們時,一些更改和一些保持未知。使用SQL數據庫中的Rmysql強制導入多語言表的編碼

我嘗試使用dbGetQuery卻是相同的:

con <- dbConnect(RMySQL::MySQL(), host = "***",dbname="***",user = "***", password = "***") 

dbGetQuery(con,"set names utf8") 

Q1 <- dbSendQuery(con, "SELECT * FROM cities") 

city <- fetch(Q1, n = -1) 

> Encoding(city$name) %>% table() 
. 
unknown 
    45734 

當我強迫的變化,有些人會轉換,但例如阿拉伯字符項沒有得到轉化。

> Encoding(city$name) <- "UTF-8" 
> Encoding(city$name) %>% table() 
. 
unknown UTF-8 
    44920  814 

這裏是SHOW VARIABLES LIKE「character_set_%

dbSendQuery(con, "SET NAMES UTF8; ") 
<MySQLResult:8,3,3> 
> dbGetQuery(con, "SHOW VARIABLES LIKE 'character_set_%'") 
      Variable_name      Value 
1  character_set_client      utf8 
2 character_set_connection      utf8 
3 character_set_database      latin1 
4 character_set_filesystem      binary 
5 character_set_results      utf8 
6  character_set_server      latin1 
7  character_set_system      utf8 
8  character_sets_dir /usr/share/mysql/charsets/ 

的結果,這裏的會話信息

> sessionInfo() 
R version 3.3.1 (2016-06-21) 
Platform: x86_64-w64-mingw32/x64 (64-bit) 
Running under: Windows >= 8 x64 (build 9200) 

locale: 
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C       
[5] LC_TIME=English_United States.1252  

attached base packages: 
[1] stats  graphics grDevices utils  datasets methods base  

other attached packages: 
[1] stringi_1.1.2 tidyr_0.5.1 purrr_0.2.2 dplyr_0.5.0 RMySQL_0.10.9 DBI_0.4-1  

loaded via a namespace (and not attached): 
[1] magrittr_1.5 R6_2.1.2  assertthat_0.1 tools_3.3.1 tibble_1.1  Rcpp_0.12.6 

下面是在數據庫的編碼: table info from php my admin

回答

1

正如docs所述:

由於 的表達式在所有支持的編碼中都是相同的,所以ASCII字符串將永遠不會標有聲明的編碼。

插圖:

v <- c("café", "floor", "window", "naïve") 
Encoding(v) <- "UTF-8" 

Encoding(v) 
# [1] "UTF-8" "unknown" "unknown" "UTF-8" 

你的一些城市的名字都是英文的,所以他們可能不包含非ASCII字符。

相關問題