2014-02-11 68 views
1

我正在運行啓用分析的DSE 3.2.4。我試圖將我的一個表格卸載到S3中以進行長期存儲。我創建了下表中配置單元:DataStax Enterprise 3.2 - Hive S3 NoSuchBucket

CREATE EXTERNAL TABLE events_archive (
    event_id string, 
    time string, 
    type string, 
    source string, 
    value string 
) 
PARTITIONED BY (year string, month string, day string, hour string) 
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' 
LOCATION 's3n://com.mydomain.events/'; 

然後我嘗試使用此查詢一些樣本數據加載到它:

CREATE TEMPORARY FUNCTION c_to_string AS 'org.apache.hadoop.hive.cassandra.ql.udf.UDFCassandraBinaryToString'; 
SET hive.exec.dynamic.partition.mode=nonstrict; 
SET hive.exec.dynamic.partition=true; 


INSERT OVERWRITE TABLE events_archive 
PARTITION (year, month, day, hour) 
SELECT c_to_string(column4, 'uuid') AS event_id, 
     from_unixtime(CAST(column3/1000 AS int)) AS time, 
     CASE column1 
     WHEN 'pageviews-push' THEN 'page_view' 
     WHEN 'score_realtime-internal' THEN 'realtime_score' 
     ELSE 'social_data' 
     END AS type, 
     CASE column1 
     WHEN 'pageviews-push' THEN 'internal' 
     WHEN 'score_realtime-internal' THEN 'internal' 
     ELSE split(column1, '-')[0] 
     END AS source, 
     value, 
     year(from_unixtime(CAST(column3/1000 AS int))) AS year, 
     month(from_unixtime(CAST(column3/1000 AS int))) AS month, 
     day(from_unixtime(CAST(column3/1000 AS int))) AS day, 
     hour(from_unixtime(CAST(column3/1000 AS int))) AS hour, 
     c_to_string(key2, 'blob') AS content_id 
    FROM events 
WHERE column2 = 'data' 
    AND value IS NOT NULL 
    AND value != '' 
LIMIT 10; 

我最終得到此異常:

2014-02-11 20:23:55,810 ERROR ql.Driver (SessionState.java:printError(400)) - FAILED: Hive Internal Error: org.apache.hadoop.fs.s3. S3Exception(org.jets3t.service.S3ServiceException: S3 GET failed for '/10.226.118.113/%2F' XML Error Message: <?xml version="1.0"  encoding="UTF-8"?><Error><Code>NoSuchBucket</Code><Message>The specified bucket does not exist</Message><BucketName>10.226.118.113< /BucketName><RequestId>FFFFBCE9711A91AE</RequestId><HostId>kXu2oMblsYKD+Jx9O5fTbjosOtTNNtyM+lbE2pmCC63Wm3abJxMvanHdSCYnUyaC</HostId></Error> ) 
org.apache.hadoop.fs.s3.S3Exception: org.jets3t.service.S3ServiceException: S3 GET failed for '/10.226.118.113/%2F' XML Error Message: < ?xml version="1.0" encoding="UTF-8"?><Error><Code>NoSuchBucket</Code><Message>The specified bucket does not exist</Message><BucketName>10. 226.118.113</BucketName><RequestId>FFFFBCE9711A91AE</RequestId><HostId>kXu2oMblsYKD+Jx9O5fTbjosOtTNNtyM+lbE2pmCC63Wm3abJxMvanHdSCYnUyaC< /HostId></Error> 
    at org.apache.hadoop.fs.s3.Jets3tFileSystemStore.get(Jets3tFileSystemStore.java:156) 
    at org.apache.hadoop.fs.s3.Jets3tFileSystemStore.retrieveINode(Jets3tFileSystemStore.java:195) 
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) 
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) 
    at java.lang.reflect.Method.invoke(Method.java:597) 
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82) 
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59) 
    at $Proxy14.retrieveINode(Unknown Source) 
    at org.apache.hadoop.fs.s3.S3FileSystem.mkdir(S3FileSystem.java:148) 
    at org.apache.hadoop.fs.s3.S3FileSystem.mkdirs(S3FileSystem.java:141) 
    at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:1126) 
    at org.apache.hadoop.hive.ql.Context.getScratchDir(Context.java:165) 
    at org.apache.hadoop.hive.ql.Context.getExternalScratchDir(Context.java:222) 
    at org.apache.hadoop.hive.ql.Context.getExternalTmpFileURI(Context.java:315) 
    at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genFileSinkPlan(SemanticAnalyzer.java:4049) 
    at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPostGroupByBodyPlan(SemanticAnalyzer.java:6205) 
    at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:6136) 
    at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:6762) 
    at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:7531) 
    at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:243) 
    at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:431) 
    at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:336) 
    at org.apache.hadoop.hive.ql.Driver.run(Driver.java:909) 
    at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:258) 
    at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:215) 
    at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:406) 
    at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:689) 
    at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:557) 
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) 
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) 
    at java.lang.reflect.Method.invoke(Method.java:597) 
    at org.apache.hadoop.util.RunJar.main(RunJar.java:156) 
Caused by: org.jets3t.service.S3ServiceException: S3 GET failed for '/10.226.118.113/%2F' XML Error Message: <?xml version="1.0"  encoding="UTF-8"?><Error><Code>NoSuchBucket</Code><Message>The specified bucket does not exist</Message><BucketName>10.226.118.113< /BucketName><RequestId>FFFFBCE9711A91AE</RequestId><HostId>kXu2oMblsYKD+Jx9O5fTbjosOtTNNtyM+lbE2pmCC63Wm3abJxMvanHdSCYnUyaC</HostId></Error> 
    at org.jets3t.service.impl.rest.httpclient.RestS3Service.performRequest(RestS3Service.java:416) 
    at org.jets3t.service.impl.rest.httpclient.RestS3Service.performRestGet(RestS3Service.java:752) 
    at org.jets3t.service.impl.rest.httpclient.RestS3Service.getObjectImpl(RestS3Service.java:1601) 
    at org.jets3t.service.impl.rest.httpclient.RestS3Service.getObjectImpl(RestS3Service.java:1544) 
    at org.jets3t.service.S3Service.getObject(S3Service.java:2072) 
    at org.jets3t.service.S3Service.getObject(S3Service.java:1310) 
    at org.apache.hadoop.fs.s3.Jets3tFileSystemStore.get(Jets3tFileSystemStore.java:144) 
... 33 more 

Hive S3連接器是否支持最新的DSE?或者我可能做錯了什麼?

+0

您的存儲桶名稱實際上是否是「10.226.118.113」? – Rico

+0

不,因爲它在「CREATE TABLE」中顯示的存儲桶名稱是「com.mydomain.events」。 '10.226.118.113'是我正在執行命令的節點的IP地址。 –

+0

您是否需要在查詢中指定存儲桶?很明顯,你將把你的IP地址作爲存儲桶名稱。 – Rico

回答

3

嘗試在你的蜂巢安裝如下:

蜂房的site.xml

<property> 
    <name>fs.default.name</name> 
    <value>s3n://your-bucket</value> 
</property> 

核心的site.xml

<property> 
    <name>fs.s3n.awsAccessKeyId</name> 
    <value>Your AWS Key</value> 
</property> 

<property> 
    <name>fs.s3n.awsSecretAccessKey</name> 
    <value>Your AWS Secret Key</value> 
</property> 

這是每3.1文檔:http://www.datastax.com/docs/datastax_enterprise3.1/solutions/about_hive

下:

在蜂巢

使用一個外部文件系統

沒有看到它在3.2文檔。不知道爲什麼他們省略了它,但看起來像是你在S3上運行Hive必不可少的東西

+0

這似乎適用於我,我有在那裏的訪問鍵,我只是沒有默認名稱設置爲's3n:// my-bucket /'。它已經設置爲'cfs:// local-ip /',所以我想知道這是否會導致一些尚未被注意的問題。 –

0

S3文件系統的Hadoop實現已過時,因此從配置單元向S3寫入數據不能很好地工作。我們通過閱讀來解決這個問題。現在DSE可以讀取S3文件,但寫入有問題。我們會檢查它是否可以儘快修復它

+0

將數據寫入s3有什麼問題? –

相關問題