2012-09-25 64 views
1

我正在嘗試使用Hive讀取存儲在Cassandra文件系統(CFS)中的固定寬度文本文件。當我從配置單元客戶端運行時,我能夠查詢文件。但是,當我嘗試從Hadoop Hive JDBC運行時,它說表格不可用或連接不良。以下是我遵循的步驟。DataStax Cassandra文件系統 - 固定寬度文本文件 - Hive集成問題

輸入文件(employees.dat):

21736Ambalavanar    Thirugnanam    BOY-EAG  2005-05-091992-11-18 
21737Anand     Jeyamani     BOY-AST  2005-05-091985-02-12 
31123Muthukumar    Rajendran    BOY-EES  2009-08-121983-02-23 

啓動蜂房客戶

bash-3.2# dse hive; 
Logging initialized using configuration in file:/etc/dse/hive/hive-log4j.properties 
Hive history file=/tmp/root/hive_job_log_root_201209250900_157600446.txt 
hive> use HiveDB; 
OK 
Time taken: 1.149 seconds 

創建蜂房外部表指向固定寬度的格式的文本文件

hive> CREATE EXTERNAL TABLE employees (empid STRING, firstname STRING, lastname STRING, dept STRING, dateofjoining STRING, dateofbirth STRING) 
    > ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe' 
    > WITH SERDEPROPERTIES ("input.regex" = "(.{5})(.{25})(.{25})(.{15})(.{10})(.{10}).*") 
    > LOCATION 'cfs://hostname:9160/folder/'; 
OK 
Time taken: 0.524 seconds 

從表中選擇*。

hive> select * from employees; 
OK 
21736 Ambalavanar      Thirugnanam      BOY-EAG  2005-05-09  1992-11-18 
21737 Anand       Jeyamani      BOY-AST  2005-05-09  1985-02-12 
31123 Muthukumar      Rajendran      BOY-EES  2009-08-12  1983-02-23 
Time taken: 0.698 seconds 

不要從蜂巢表的具體字段選擇拋出權限錯誤(首次發行)

hive> select empid, firstname from employees; 
Total MapReduce jobs = 1 
Launching Job 1 out of 1 
Number of reduce tasks is set to 0 since there's no reduce operator 
java.io.IOException: The ownership/permissions on the staging directory cfs:/tmp/hadoop-root/mapred/staging/root/.staging is not as expected. It is owned by root and permissions are rwxrwxrwx. The directory must be owned by the submitter root or by root and permissions must be rwx------ 
     at org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:108) 
     at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:856) 
     at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:850) 
     at java.security.AccessController.doPrivileged(Native Method) 
     at javax.security.auth.Subject.doAs(Subject.java:416) 
     at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093) 
     at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:850) 
     at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:824) 
     at org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:452) 
     at org.apache.hadoop.hive.ql.exec.MapRedTask.execute(MapRedTask.java:136) 
     at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:133) 
     at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57) 
     at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1332) 
     at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1123) 
     at org.apache.hadoop.hive.ql.Driver.run(Driver.java:931) 
     at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:255) 
     at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:212) 
     at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:403) 
     at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:671) 
     at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:554) 
     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
     at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) 
     at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 
     at java.lang.reflect.Method.invoke(Method.java:616) 
     at org.apache.hadoop.util.RunJar.main(RunJar.java:156) 
Job Submission failed with exception 'java.io.IOException(The ownership/permissions on the staging directory cfs:/tmp/hadoop-root/mapred/staging/root/.staging is not as expected. It is owned by root and permissions are rwxrwxrwx. The directory must be owned by the submitter root or by root and permissions must be rwx------)' 
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MapRedTask 

第二個問題是,當我嘗試從JDBC蜂巢選擇*查詢驅動程序(在dse/cassandra節點之外),它表示表員工不可用。創建的外部表格就像一個臨時表格,並且不會持續。當我使用「配置單元>顯示錶」時,員工表未列出。任何人都可以請幫我找出問題嗎?

回答

3

我沒有第一個問題的直接答案,但第二個看起來像是由於一個已知問題。

DSE 2.1中存在一個錯誤,它會在運行show表時刪除從Metastore的CFS文件創建的外部表。只有表格元數據被刪除,數據保留在CFS中,所以如果重新創建表格定義,您不必重新加載它。由Cassandra ColumnFamilies支持的表不受此錯誤的影響。這已經在DSE的2.2版本中得到修復,該版本即將發佈。

我不熟悉Hive JDBC驅動程序,但如果它在任何時候發出Show Tables命令,都可能觸發此錯誤。

+0

謝謝Beobal。我將在DataStax論壇中提出這個問題。 – Ambal