2016-06-30 39 views
0

數據庫的所有日期都是格林威治標準時間,而sqoop自動將本地(亞洲/俄羅斯聯邦)用於增量更新。mapreduce.map.java.opts =「-Duser.timezone = GMT」不起作用

它可能會從JVM中挑選出來,但我需要它將GMT用於某些作業並將其用於本地。我如何解決這個問題?

鏈接https://community.cloudera.com/t5/Data-Ingestion-Integration/Sqoop-s-metastore-timezone/td-p/16306

討論同樣的問題。有沒有一個實際的解決方法呢?線程中給出的解決方案並不適合我。

這裏就是我有一個sqoop工作:

sqoop job -D oracle.sessionTimeZone=GMT -D mapred.child.java.opts=" -Duser.timezone=GMT" --meta-connect jdbc:hsqldb:hsql://FQDN:16000/sqoop --create JOB_NAME -- import --driver com.mysql.jdbc.Driver --connect jdbc:mysql://IP/DB?zeroDateTimeBehavior=convertToNull --username root --password 'PASSWORD' --table TABLE_NAME--incremental lastmodified --check-column updated_at --last-value 0 --merge-key entity_id --split-by entity_id --target-dir LOCATION_SPECIFIED --hive-database Magento --hive-drop-import-delims --null-string '\\N' --null-non-string '\\N' --fields-terminated-by '\001' --input-null-string '\\N' --input-null-non-string '\\N' --input-null-non-string '\\N' --input-fields-terminated-by '\001' 

日誌:

5459 [uber-SubtaskRunner] WARN org.apache.sqoop.tool.SqoopTool - $SQOOP_CONF_DIR has not been set in the environment. Cannot check for additional configuration. 
5497 [uber-SubtaskRunner] INFO org.apache.sqoop.Sqoop - Running Sqoop version: 1.4.6-cdh5.7.0 
5817 [uber-SubtaskRunner] WARN org.apache.sqoop.tool.BaseSqoopTool - Setting your password on the command-line is insecure. Consider using -P instead. 
5832 [uber-SubtaskRunner] WARN org.apache.sqoop.ConnFactory - $SQOOP_CONF_DIR has not been set in the environment. Cannot check for additional configuration. 
5859 [uber-SubtaskRunner] WARN org.apache.sqoop.ConnFactory - Parameter --driver is set to an explicit driver however appropriate connection manager is not being set (via --connection-manager). Sqoop is going to fall back to org.apache.sqoop.manager.GenericJdbcManager. Please specify explicitly which connection manager should be used next time. 
5874 [uber-SubtaskRunner] INFO org.apache.sqoop.manager.SqlManager - Using default fetchSize of 1000 
5874 [uber-SubtaskRunner] INFO org.apache.sqoop.tool.CodeGenTool - Beginning code generation 
6306 [uber-SubtaskRunner] INFO org.apache.sqoop.manager.SqlManager - Executing SQL statement: SELECT t.* FROM sales_flat_order AS t WHERE 1=0 
6330 [uber-SubtaskRunner] INFO org.apache.sqoop.manager.SqlManager - Executing SQL statement: SELECT t.* FROM sales_flat_order AS t WHERE 1=0 
6434 [uber-SubtaskRunner] INFO org.apache.sqoop.orm.CompilationManager - HADOOP_MAPRED_HOME is /opt/cloudera/parcels/CDH-5.7.0-1.cdh5.7.0.p0.45/lib/hadoop-mapreduce 
9911 [uber-SubtaskRunner] INFO org.apache.sqoop.orm.CompilationManager - Writing jar file: /tmp/sqoop-yarn/compile/51c9a7f9e76b0547825eb7a852721bf9/sales_flat_order.jar 
9928 [uber-SubtaskRunner] INFO org.apache.sqoop.manager.SqlManager - Executing SQL statement: SELECT t.* FROM sales_flat_order AS t WHERE 1=0 
9941 [uber-SubtaskRunner] INFO org.apache.sqoop.tool.ImportTool - Incremental import based on column updated_at 
9941 [uber-SubtaskRunner] INFO org.apache.sqoop.tool.ImportTool - Lower bound value: '0' 
9941 [uber-SubtaskRunner] INFO org.apache.sqoop.tool.ImportTool - Upper bound value: '2016-06-30 11:40:36.0' 
9943 [uber-SubtaskRunner] INFO org.apache.sqoop.mapreduce.ImportJobBase - Beginning import of sales_flat_order 
9962 [uber-SubtaskRunner] INFO org.apache.sqoop.manager.SqlManager - Executing SQL statement: SELECT t.* FROM sales_flat_order AS t WHERE 1=0 
10007 [uber-SubtaskRunner] WARN org.apache.sqoop.mapreduce.JobBase - SQOOP_HOME is unset. May not be able to find all job dependencies. 
10672 [uber-SubtaskRunner] INFO org.apache.sqoop.mapreduce.db.DBInputFormat - Using read commited transaction isolation 
10674 [uber-SubtaskRunner] INFO org.apache.sqoop.mapreduce.db.DataDrivenDBInputFormat - BoundingValsQuery: SELECT MIN(entity_id), MAX(entity_id) FROM sales_flat_order WHERE (updated_at >= '0' AND updated_at < '2016-06-30 11:40:36.0') 
11667 [uber-SubtaskRunner] INFO org.apache.sqoop.mapreduce.db.IntegerSplitter - Split size: 86592; Num splits: 4 from: 1 to: 346372 
Heart beat 
42986 [uber-SubtaskRunner] INFO org.apache.sqoop.mapreduce.ImportJobBase - Transferred 300.3027 MB in 32.9683 seconds (9.1088 MB/sec) 
42995 [uber-SubtaskRunner] INFO org.apache.sqoop.mapreduce.ImportJobBase - Retrieved 339510 records. 
43008 [uber-SubtaskRunner] INFO org.apache.sqoop.tool.ImportTool - Saving incremental import state to the metastore 
43224 [uber-SubtaskRunner] INFO org.apache.sqoop.tool.ImportTool - Updated data for job: sales_flat_order 
+0

因此,如果Sqoop停留在特定的時區,爲什麼不把問題轉到在解析查詢中的字符串時,強制您的Oracle **會話**默認使用相同的TZ * *像'export TZ = Asia/kolkata'這樣的東西(我使用了有關Java屬性'oracle.sessionTimeZone'的搜索引擎,但沒有發現任何東西 - 你在哪裏找到那個?!?) –

+0

這裏。這來自官方的sqoop文檔:默認情況下,Sqoop將向Oracle指定時區「GMT」。您可以通過在運行Sqoop作業時在命令行上指定Hadoop屬性oracle.sessionTimeZone來覆蓋此設置。例如: $ sqoop import -D oracle.sessionTimeZone = America/Los_Angeles \ --connect jdbc:oracle:thin:@ // db.example.com/foo --table bar https://sqoop.apache。 org/docs/1.4.6/SqoopUserGuide.html#_importing_data_into_hive –

+0

@SamsonScharfrichter:我試圖從生產數據庫導入數據。它的格林威治標準時間的所有日期均爲updated_at列,這是我們在sqoop中爲增量導入指定的日期。我只需要爲某些sqoop工作使用GMT。請您解釋一下您的建議在這種情況下是否仍然有效? –

回答

1

在Oracle方面可能的解決方法可能是:

  • 添加virtual column到您的餐桌
  • 使用它來顯示原始GMT日期時間作爲Sqoop本地時區,使用幾個CAST()和AT TIME ZONE轉換
  • 然後可以選擇在該虛擬列上創建一個索引,並檢查它是否實際上由Sqoop輸出(如果它足夠大的差異來補償在INSERT時更新索引的開銷)
+0

謝謝@Samson。那是什麼。我沒有權限在oracle端更改它,所以我基本上需要調整sqoop來捕獲GMT而不是本地時區。 –

+1

@Simran,你考慮過使用Spark腳本而不是Sqoop作業嗎?通過JDBC連接到Oracle,運行任意SELECT *(如果需要,並行)*,檢查增量列中的Max()值並將其存儲在某個地方*(即。一個類似於Sqoop的虛擬HSQLDB,使用單獨的JDBC連接)*用於下一個SELECT,將數據存儲到Hive表* *(Spark具有CSV,Parquet,ORC的本地支持,並且可以回退到傳統的Hive SerDe對於其他格式)*。這可能比修補和重新編譯Sqoop更容易。 –

+0

謝謝。我從來沒有真正看過Spark,因爲我剛剛開始使用大數據工具,但我肯定會看看它。謝謝 :) –