我試圖在Cloudera Hue界面中運行一個配置單元查詢,並且它可以正常工作幾百條記錄。當我在一個更大的數據集上運行它時,它會失敗。我試圖在互聯網上搜索它,但它看起來像很多類似的錯誤,但不是我正在尋找的確切解決方案。我在我的配置單元查詢中使用redexp_replace,我不認爲這導致任何異常(我的印象是它可以輕鬆處理字符串和NULL類型)regexp_replace異常
我得到的錯誤是java.util.regex.PatternSyntaxException :不匹配閉 ')' 鄰近索引12
UPDATE: 這是引起問題的記錄:
columnA:READDATA(或ListDirectory)
columnB:ListDirectory)
ç olumnC:NULL
columnD:NULL
我的查詢: REGEXP_REPLACE(columnA,columnB, 「」)作爲columnA, 塔B REGEXP_REPLACE(columnC,columnD, 「」)作爲columnC,
請讓我知道我要去哪裏錯了。
這裏是日誌... * ... [記錄]的有趣的部分...
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:565)
at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:143)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:418)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:333)
at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.mapred.Child.main(Child.java:262)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: DFSOutputStream is closed
at org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:620)
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:474)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:800)
at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:474)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:800)
at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:83)
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:474)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:800)
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:546)
... 9 more
Caused by: java.io.IOException: DFSOutputStream is closed
at org.apache.hadoop.hdfs.DFSOutputStream.isClosed(DFSOutputStream.java:1239)
at org.apache.hadoop.hdfs.DFSOutputStream.writeChunk(DFSOutputStream.java:1407)
at org.apache.hadoop.fs.FSOutputSummer.writeChecksumChunk(FSOutputSummer.java:161)
at org.apache.hadoop.fs.FSOutputSummer.write1(FSOutputSummer.java:104)
at org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:90)
at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:54)
at java.io.DataOutputStream.write(DataOutputStream.java:90)
at org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat$1.write(HiveIgnoreKeyTextOutputFormat.java:86)
at org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:606)
... 18 more
2013-05-31 16:35:20,090 INFO org.apache.hadoop.hive.ql.exec.MapOperator: 3 finished. closing...
2013-05-31 16:35:20,090 INFO org.apache.hadoop.hive.ql.exec.MapOperator: 3 forwarded 90478 rows
2013-05-31 16:35:20,090 INFO org.apache.hadoop.hive.ql.exec.MapOperator: DESERIALIZE_ERRORS:0
2013-05-31 16:35:20,090 INFO org.apache.hadoop.hive.ql.exec.TableScanOperator: 0 finished. closing...
2013-05-31 16:35:20,090 INFO org.apache.hadoop.hive.ql.exec.TableScanOperator: 0 forwarded 90478 rows
2013-05-31 16:35:20,090 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 1 finished. closing...
2013-05-31 16:35:20,090 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 1 forwarded 90478 rows
2013-05-31 16:35:20,090 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: 2 finished. closing...
2013-05-31 16:35:20,090 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: 2 forwarded 0 rows
2013-05-31 16:35:20,090 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: TABLE_ID_1_ROWCOUNT:90478
2013-05-31 16:35:20,090 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 1 Close done
2013-05-31 16:35:20,090 INFO org.apache.hadoop.hive.ql.exec.TableScanOperator: 0 Close done
2013-05-31 16:35:20,090 INFO org.apache.hadoop.hive.ql.exec.MapOperator: 3 Close done
2013-05-31 16:35:20,090 INFO ExecMapper: ExecMapper: processed 90477 rows: used memory = 10815536
2013-05-31 16:35:20,097 INFO org.apache.hadoop.mapred.TaskLogsTruncater: Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1
2013-05-31 16:35:20,099 ERROR org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:hdfs (auth:SIMPLE) cause:java.io.IOException: Filesystem closed
2013-05-31 16:35:20,099 WARN org.apache.hadoop.mapred.Child: Error running child
java.io.IOException: Filesystem closed
at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:552)
at org.apache.hadoop.hdfs.DFSInputStream.close(DFSInputStream.java:522)
at java.io.FilterInputStream.close(FilterInputStream.java:155)
at org.apache.hadoop.util.LineReader.close(LineReader.java:149)
at org.apache.hadoop.mapred.LineRecordReader.close(LineRecordReader.java:195)
at org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doClose(CombineHiveRecordReader.java:72)
at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.close(HiveContextAwareRecordReader.java:96)
at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.close(HadoopShimsSecure.java:273)
at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.close(MapTask.java:223)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:422)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:333)
at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.mapred.Child.main(Child.java:262)
2013-05-31 16:35:20,102 WARN org.apache.hadoop.mapred.Task: Parent died. Exiting attempt_201305300036_0011_m_000000_1
確實,columnB或columnD值需要是有效的Regexp模式。你可以用'\('或者用另一個替換字符來轉義它們。 – Romain