數據庫的所有日期都是格林威治標準時間,而sqoop自動將本地(亞洲/俄羅斯聯邦)用於增量更新。mapreduce.map.java.opts =「-Duser.timezone = GMT」不起作用
它可能會從JVM中挑選出來,但我需要它將GMT用於某些作業並將其用於本地。我如何解決這個問題?
鏈接https://community.cloudera.com/t5/Data-Ingestion-Integration/Sqoop-s-metastore-timezone/td-p/16306
討論同樣的問題。有沒有一個實際的解決方法呢?線程中給出的解決方案並不適合我。
這裏就是我有一個sqoop工作:
sqoop job -D oracle.sessionTimeZone=GMT -D mapred.child.java.opts=" -Duser.timezone=GMT" --meta-connect jdbc:hsqldb:hsql://FQDN:16000/sqoop --create JOB_NAME -- import --driver com.mysql.jdbc.Driver --connect jdbc:mysql://IP/DB?zeroDateTimeBehavior=convertToNull --username root --password 'PASSWORD' --table TABLE_NAME--incremental lastmodified --check-column updated_at --last-value 0 --merge-key entity_id --split-by entity_id --target-dir LOCATION_SPECIFIED --hive-database Magento --hive-drop-import-delims --null-string '\\N' --null-non-string '\\N' --fields-terminated-by '\001' --input-null-string '\\N' --input-null-non-string '\\N' --input-null-non-string '\\N' --input-fields-terminated-by '\001'
日誌:
5459 [uber-SubtaskRunner] WARN org.apache.sqoop.tool.SqoopTool - $SQOOP_CONF_DIR has not been set in the environment. Cannot check for additional configuration.
5497 [uber-SubtaskRunner] INFO org.apache.sqoop.Sqoop - Running Sqoop version: 1.4.6-cdh5.7.0
5817 [uber-SubtaskRunner] WARN org.apache.sqoop.tool.BaseSqoopTool - Setting your password on the command-line is insecure. Consider using -P instead.
5832 [uber-SubtaskRunner] WARN org.apache.sqoop.ConnFactory - $SQOOP_CONF_DIR has not been set in the environment. Cannot check for additional configuration.
5859 [uber-SubtaskRunner] WARN org.apache.sqoop.ConnFactory - Parameter --driver is set to an explicit driver however appropriate connection manager is not being set (via --connection-manager). Sqoop is going to fall back to org.apache.sqoop.manager.GenericJdbcManager. Please specify explicitly which connection manager should be used next time.
5874 [uber-SubtaskRunner] INFO org.apache.sqoop.manager.SqlManager - Using default fetchSize of 1000
5874 [uber-SubtaskRunner] INFO org.apache.sqoop.tool.CodeGenTool - Beginning code generation
6306 [uber-SubtaskRunner] INFO org.apache.sqoop.manager.SqlManager - Executing SQL statement: SELECT t.* FROM sales_flat_order AS t WHERE 1=0
6330 [uber-SubtaskRunner] INFO org.apache.sqoop.manager.SqlManager - Executing SQL statement: SELECT t.* FROM sales_flat_order AS t WHERE 1=0
6434 [uber-SubtaskRunner] INFO org.apache.sqoop.orm.CompilationManager - HADOOP_MAPRED_HOME is /opt/cloudera/parcels/CDH-5.7.0-1.cdh5.7.0.p0.45/lib/hadoop-mapreduce
9911 [uber-SubtaskRunner] INFO org.apache.sqoop.orm.CompilationManager - Writing jar file: /tmp/sqoop-yarn/compile/51c9a7f9e76b0547825eb7a852721bf9/sales_flat_order.jar
9928 [uber-SubtaskRunner] INFO org.apache.sqoop.manager.SqlManager - Executing SQL statement: SELECT t.* FROM sales_flat_order AS t WHERE 1=0
9941 [uber-SubtaskRunner] INFO org.apache.sqoop.tool.ImportTool - Incremental import based on column updated_at
9941 [uber-SubtaskRunner] INFO org.apache.sqoop.tool.ImportTool - Lower bound value: '0'
9941 [uber-SubtaskRunner] INFO org.apache.sqoop.tool.ImportTool - Upper bound value: '2016-06-30 11:40:36.0'
9943 [uber-SubtaskRunner] INFO org.apache.sqoop.mapreduce.ImportJobBase - Beginning import of sales_flat_order
9962 [uber-SubtaskRunner] INFO org.apache.sqoop.manager.SqlManager - Executing SQL statement: SELECT t.* FROM sales_flat_order AS t WHERE 1=0
10007 [uber-SubtaskRunner] WARN org.apache.sqoop.mapreduce.JobBase - SQOOP_HOME is unset. May not be able to find all job dependencies.
10672 [uber-SubtaskRunner] INFO org.apache.sqoop.mapreduce.db.DBInputFormat - Using read commited transaction isolation
10674 [uber-SubtaskRunner] INFO org.apache.sqoop.mapreduce.db.DataDrivenDBInputFormat - BoundingValsQuery: SELECT MIN(entity_id), MAX(entity_id) FROM sales_flat_order WHERE (updated_at >= '0' AND updated_at < '2016-06-30 11:40:36.0')
11667 [uber-SubtaskRunner] INFO org.apache.sqoop.mapreduce.db.IntegerSplitter - Split size: 86592; Num splits: 4 from: 1 to: 346372
Heart beat
42986 [uber-SubtaskRunner] INFO org.apache.sqoop.mapreduce.ImportJobBase - Transferred 300.3027 MB in 32.9683 seconds (9.1088 MB/sec)
42995 [uber-SubtaskRunner] INFO org.apache.sqoop.mapreduce.ImportJobBase - Retrieved 339510 records.
43008 [uber-SubtaskRunner] INFO org.apache.sqoop.tool.ImportTool - Saving incremental import state to the metastore
43224 [uber-SubtaskRunner] INFO org.apache.sqoop.tool.ImportTool - Updated data for job: sales_flat_order
因此,如果Sqoop停留在特定的時區,爲什麼不把問題轉到在解析查詢中的字符串時,強制您的Oracle **會話**默認使用相同的TZ * *像'export TZ = Asia/kolkata'這樣的東西(我使用了有關Java屬性'oracle.sessionTimeZone'的搜索引擎,但沒有發現任何東西 - 你在哪裏找到那個?!?) –
這裏。這來自官方的sqoop文檔:默認情況下,Sqoop將向Oracle指定時區「GMT」。您可以通過在運行Sqoop作業時在命令行上指定Hadoop屬性oracle.sessionTimeZone來覆蓋此設置。例如: $ sqoop import -D oracle.sessionTimeZone = America/Los_Angeles \ --connect jdbc:oracle:thin:@ // db.example.com/foo --table bar https://sqoop.apache。 org/docs/1.4.6/SqoopUserGuide.html#_importing_data_into_hive –
@SamsonScharfrichter:我試圖從生產數據庫導入數據。它的格林威治標準時間的所有日期均爲updated_at列,這是我們在sqoop中爲增量導入指定的日期。我只需要爲某些sqoop工作使用GMT。請您解釋一下您的建議在這種情況下是否仍然有效? –