2016-03-30 41 views
0

最近我們遇到了問題,如標題中描述的每月一次。在metastore節點上,我們安裝並啓動了ntpd服務以與kerberos服務器同步時間。該節點上的krb5.conf的看起來像這樣:
患有Kerberos的蜂巢Metastore「時鐘扭曲太大」錯誤

[libdefaults]
default_realm = EXAMPLE.COM
dns_lookup_realm =真
dns_lookup_kdc =真
ticket_lifetime = 24小時
renew_lifetime = 7D
轉發= true

因此,由於問題或網絡阻塞導致metastore與Kerberos服務器(> = 5min)不同步的時間似乎不太可能。
從metastore日誌中,「時鐘偏差太大」異常記錄的時間是不按順序,比如看,

2016年1月16日18:18:48071錯誤[池3線程63735]
2016-01-16 19:07:03,699錯誤[pool-3-thread-63798]
2016-01-16 19:06:55,998錯誤[pool-3-thread-63796]
2016-01- 16 19:06:41,653錯誤[pool-3-thread-63812]
2016-01-16 19:04:28,659錯誤[pool-3-thread-63806]
2016-01-16 19:04:13,937錯誤[pool-3-thread-63804]
2016-01-16 19:02:19,312錯誤[pool-3-thread-63809]
2016-01-16 19:02:13,115錯誤[pool-3-thread-63794]
2016-01-16 19:02:06,028錯誤[pool-3-thread-63800]
2016-01- 16 19:01:50,767錯誤[pool-3-thread-63795]
2016-01-16 18:59:36,926錯誤[pool-3-thread-63810]
2016-01-16 18:59:36,394 ERROR [池-3-線程63797]

異常堆棧:

 
2016-01-16 18:59:36,394 ERROR [pool-3-thread-63797]: transport.TSaslTransport (TSaslTransport.java:open(296)) - SASL negotiation failure 
javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: Failure unspecified at GSS-API level (Mechanism level: Clock skew too great (37))] 
     at com.sun.security.sasl.gsskerb.GssKrb5Server.evaluateResponse(GssKrb5Server.java:177) 
     at org.apache.thrift.transport.TSaslTransport$SaslParticipant.evaluateChallengeOrResponse(TSaslTransport.java:509) 
     at org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:264) 
     at org.apache.thrift.transport.TSaslServerTransport.open(TSaslServerTransport.java:41) 
     at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$HiveSaslServerTransportFactory.getTransport(HadoopThriftAuthBridge.java:172) 
     at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge20S$Server$TUGIAssumingTransportFactory$1.run(HadoopThriftAuthBridge20S.java:678) 
     at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge20S$Server$TUGIAssumingTransportFactory$1.run(HadoopThriftAuthBridge20S.java:675) 
     at java.security.AccessController.doPrivileged(Native Method) 
     at javax.security.auth.Subject.doAs(Subject.java:356) 
     at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1536) 
     at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge20S$Server$TUGIAssumingTransportFactory.getTransport(HadoopThriftAuthBridge20S.java:675) 
     at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:189) 
     at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 
     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 
     at java.lang.Thread.run(Thread.java:744) 
Caused by: GSSException: Failure unspecified at GSS-API level (Mechanism level: Clock skew too great (37)) 
     at org.apache.thrift.transport.TSaslServerTransport.handleSaslStartMessage(TSaslServerTransport.java:125) 
     at org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:253) 
     at org.apache.thrift.transport.TSaslServerTransport.open(TSaslServerTransport.java:41) 
     at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$HiveSaslServerTransportFactory.getTransport(HadoopThriftAuthBridge.java:172) 
     ... 10 more 

ENV:

 
java version "1.7.0_45" 
Java HotSpot(TM) 64-Bit Server VM (build 24.51-b03, mixed mode) 
hive-0.13.1.2.1.10.0-hdp 

那麼我應該怎麼做,如果我想找出根本原因?有什麼建議麼? 非常感謝。

+0

您是否嘗試過與NTP同步時間? –

+0

是的,我們之前已經嘗試過。當發生異常時,事實證明,Metastore忙於響應請求。我們必須重新啓動它。 – Dengzh

+0

您是否檢查過NecronoKerberoMicon ie(用於常見錯誤消息)https://steveloughran.gitbooks.io/kerberos_and_hadoop/content/sections/errors.html和(用於調試)https://steveloughran.gitbooks.io/kerberos_and_hadoop/content /sections/secrets.html –

回答

1

我也看到了這個錯誤,在我的情況下,根本原因與Kerberos無關。如果您將MySql數據庫用作數據存儲,則存在非常嚴重的內存泄漏https://issues.apache.org/jira/browse/HIVE-15551,它在0.13中引入,並且在Hive 1.3.0之前沒有修復。基本上,最初編寫代碼的人要麼忘記了,要麼沒有意識到必須顯式關閉JDBC語句,並且這會在您的進程達到其內存限制時導致垃圾收集過多。一旦發生這種情況,進程中的所有內容都會逐漸變慢,直到開始看到這些時鐘偏移錯誤。

您可以通過在Metastore進程上運行jmap直方圖來判斷這是否是您的問題。如果你看到列表頂部的JDBC對象(在我的例子中是com.mysql.jdbc.JDBC42ResultSet和com.mysql.jdbc.StatementImpl),你可能會遇到這個問題。我建議您應用補丁,升級到Hive 1.3.0,或者使用問題中提到的解決方法來查看是否清除了問題。