2015-09-12 107 views
2
使用

命令:sqoop從HDFS導出到Oracle錯誤

sqoop export --connect jdbc:oracle:thin:@//xxx:1521/BDWDEV4 --username xxx --password xxx --table TW5T0 --export-dir '/data/raw/oltp/cogen/oraclexport/TW5T0/2015-08-18' -m 8 --input-fields-terminated-by '\001' --lines-terminated-by '\n' --input-escaped-by '\"' --input-optionally-enclosed-by '\"' 

目標表有數據類型的日期列在Oracle中,但作爲展現在錯誤時被解析簡單的日期戳

錯誤:

15/09/11 06:07:12 INFO mapreduce.Job: map 0% reduce 0% 15/09/11 06:07:17 INFO mapreduce.Job: Task Id : attempt_1438142065989_99811_m_000000_0, Status : FAILED Error: java.io.IOException: Can't export data, please check failed map task logs 
     at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:112) 
     at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:39) 
     at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145) 
     at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64) 
     at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) 
     at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340) 
     at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) 
     at java.security.AccessController.doPrivileged(Native Method) 
     at javax.security.auth.Subject.doAs(Subject.java:415) 
     at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1594) 
     at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) Caused by: java.lang.RuntimeException: Can't parse input data: '2015-08-15' 
     at TZ401.__loadFromFields(TZ401.java:792) 
     at TZ401.parse(TZ401.java:645) 
     at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:83) 
     ... 10 more Caused by: java.lang.IllegalArgumentException: Timestamp format must be yyyy-mm-dd hh:mm:ss[.fffffffff] 
     at java.sql.Timestamp.valueOf(Timestamp.java:202) 
     at TZ401.__loadFromFields(TZ401.java:709) 
     ... 12 more 

回答

1

http://archive.cloudera.com/cdh/3/sqoop/SqoopUserGuide.html#_dates_and_times

Oracle JDBC represents DATE and TIME SQL types as TIMESTAMP values. Any DATE columns in an Oracle database will be imported as a TIMESTAMP in Sqoop, and Sqoop-generated code will store these values in java.sql.Timestamp fields.

When exporting data back to a database, Sqoop parses text fields as TIMESTAMP types (with the form yyyy-mm-dd HH:MM:SS.ffffffff) even if you expect these fields to be formatted with the JDBC date escape format of yyyy-mm-dd. Dates exported to Oracle should be formatted as full timestamps.

因此,在導出到Oracle之前,您需要格式化文件中的日期以符合格式yyyy-mm-dd HH:MM:SS.ffffffff

編輯:

接聽評論,

There around 70 files(tables) in hdfs I need to export..So,in all files I need to change the date from yyyy-mm-dd to yyyy-mm-dd HH:MM:SS.ffffffff, any simple way to format it.

那麼你可以寫一個awk腳本來爲你做的。否則,你可以檢查,如果下面的思想工作:

  1. 創建一個新的臨時表TEMPIMPORT具有相同的結構,表TW5T0,只是將其擁有使用Sqoop到新的DATE數據類型爲VARCHAR2
  2. 載列臨時表TEMPIMPORT。從

    插入到tw5t0(選擇[[all_your_columns_here_except_date_column],TO_DATE(date_column, 'YYYY-MM-DD'):

  3. 運行DML下面將數據導出背面INT TW5T0(當然提交) tempimport);

+0

在我需要導出的hdfs中有大約70個文件(表格)。因此,在所有文件中,我需要將日期從yyyy-mm-dd更改爲yyyy-mm-dd HH:MM:SS.ffffffff,任何簡單的方法來格式化它。 –

+0

@vinayak更新了評論的答案。 – toddlermenot

1

使用--connection-PARAM文件ora.porperties出口sqoop

ora.properties包含 oracle.jdbc.mapDateToTimestamp =假

1

而是改變你的數據文件在Hadoop中,你應該在你的sqoop導出中使用--map-column-java參數。

如果您有例如在Oracle表命名DATE_COLUMN_1DATE_COLUMN_2 2個DATE列,那麼您可以將以下參數添加到您的sqoop命令:

--map-column-java DATE_COLUMN_1=java.sql.Date,DATE_COLUMN_2=java.sql.Date 

正如前面提到的,JDBC格式必須是用在你的Hadoop文本文件中。但在這種情況下,yyyy-mm-dd將工作。

0

Oracle drivers map oracle.sql.DATE to java.sql.Timestamp, retaining the time information. If you still want the incorrect but 10g compatible oracle.sql.DATE to java.sql.Date mapping, then you can get it by setting the value of mapDateToTimestamp flag to false (default is true).

https://docs.oracle.com/cd/E11882_01/java.112/e16548/apxref.htm#JJDBC28920

對於您需要添加選項sqoop使用:

--connection-param-file conn-param-file.txt

CONN-PARAM文件。TXT:

oracle.jdbc.mapDateToTimestamp=false

0

對於sqoop使用需要添加選項:

--connection-param-file conn-param-file.txt

CONN-PARAM-file.txt的:

oracle.jdbc.mapDateToTimestamp=false

0

如果蜂巢表列序列與RDBMS表列的序列順序不匹配,則存在相同錯誤的可能性。
我已通過重新創建表重新排列RDBMS中的列後解決了我的問題。