2017-03-23 42 views
1

配置單元表中的西里爾文符號有問題。安裝的版本:Ambari Hive UTF-8問題

ambari-server 2.4.2.0-136 
hive-2-5-3-0-37 1.2.1000.2.5.3.0-37 
Ubuntu 14.04 

請告訴我問題:

  1. 設置的地點,以ru_RU.UTF-8:

    [email protected]:~$ locale 
    LANG=ru_RU.UTF-8 
    LANGUAGE=ru_RU:ru 
    LC_CTYPE="ru_RU.UTF-8" 
    LC_NUMERIC="ru_RU.UTF-8" 
    LC_TIME="ru_RU.UTF-8" 
    LC_COLLATE="ru_RU.UTF-8" 
    LC_MONETARY="ru_RU.UTF-8" 
    LC_MESSAGES="ru_RU.UTF-8" 
    LC_PAPER="ru_RU.UTF-8" 
    LC_NAME="ru_RU.UTF-8" 
    LC_ADDRESS="ru_RU.UTF-8" 
    LC_TELEPHONE="ru_RU.UTF-8" 
    LC_MEASUREMENT="ru_RU.UTF-8" 
    LC_IDENTIFICATION="ru_RU.UTF-8" 
    LC_ALL=ru_RU.UTF-8 
    
  2. 連接到配置單元和創建測試表:

    [email protected]:~$ beeline -n spark -u jdbc:hive2://[email protected]:10000/ 
    
    Connecting to enter code herejdbc:hive2://[email protected]:10000/ 
    Connected to: Apache Hive (version 1.2.1000.2.5.3.0-37) 
    Driver: Hive JDBC (version 1.2.1000.2.5.3.0-37) 
    Transaction isolation: TRANSACTION_REPEATABLE_READ 
    Beeline version 1.2.1000.2.5.3.0-37 by Apache Hive 
    
    0: jdbc:hive2://[email protected]> CREATE TABLE `test`(`name` string) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' WITH SERDEPROPERTIES ('serialization.encoding'='UTF-8'); 
    No rows affected (0,127 seconds) 
    
  3. 個插入西里爾符號:

    0: jdbc:hive2://[email protected]> insert into test values('привет'); 
    
    INFO : Tez session hasn't been created yet. Opening session 
    INFO : Dag name: insert into test values('привет')(Stage-1) 
    INFO : 
    
    INFO : Status: Running (Executing on YARN cluster with App id application_1490211406894_2481) 
    
    INFO : Map 1: -/- 
    INFO : Map 1: 0/1 
    INFO : Map 1: 0(+1)/1 
    INFO : Map 1: 1/1 
    INFO : Loading data to table default.test from hdfs://hadoop.domain.com:8020/apps/hive/warehouse/test/.hive-staging_hive_2017-03-23_13-41-46_215_3133047104896717605-116/-ext-10000 
    INFO : Table default.test stats: [numFiles=1, numRows=1, totalSize=7, rawDataSize=6] 
    No rows affected (6,652 seconds) 
    
  4. 選擇從表:

    0: jdbc:hive2://[email protected]> select * from test; 
    +------------+--+ 
    | test.name | 
    +------------+--+ 
    | [email protected]  | 
    +------------+--+ 
    1 row selected (0,162 seconds) 
    

我已經讀了很多Apache的蜂巢的bug,測試的Unicode,UTF-8,UTF-16,一些isos編碼沒有運氣。

有人可以幫我嗎?

謝謝!

+0

西裏爾字母小寫字母Pe的'п'(unicode的'U + 043F')==>問號'?'(UNICODE'U + 003F'),西裏爾字母小寫字母爾' ('U + 0440')==>'@'('U + 0040')等等等等。所有字符中丟失的最高Unicode字節... – JosefZ

+0

@JosefZ thx - 爲我提供的一些新信息。任何想法如何處理它? – canavar

回答