0
我需要創建需要連接位於SQL Server中的多個表的Hive。使用Sqoop將MS SQL Server表遷移到Hive
我做了什麼:
我使用Sqoop將表/ queryresults進入HDFS也沒問題。然後,我在Hive中創建一個外部表以將表指向HDFS文件夾。另外,爲了使表信息保持最新,我需要在crontab上添加一些內容,以刪除HDFS內容並再次調用相同的sqoop命令。
上述解決方案的工作原理,但它確實是一個PITA維護。
我必須寫一個蜂巢創建外部表查詢它就像一個時間的工作,我知道,但你需要每列
我看到一個帖子here,在那裏好像SQOOP可移動記數據從SQL服務器直接進入Hive而不關心列的細節。
我嘗試這樣做:
sqoop import --driver 'net.sourceforge.jtds.jdbc.Driver'
--connect 'jdbc:jtds:sqlserver:/ip:1433/db;user=user;password=password'
--direct
--table "[db\$Shipment Cross Reference]"
--hive-import
--direct
--hive-overwrite
--create-hive-table
--hive-table shipment_cross_reference
--fields-terminated-by '\001'
--lines-terminated-by '\n' -m 1
但是,它出錯了:
14/03/12 19:19:31 INFO mapred.JobClient: Job complete: job_201403051725_0059
14/03/12 19:19:31 INFO mapred.JobClient: Counters: 6
14/03/12 19:19:31 INFO mapred.JobClient: Job Counters
14/03/12 19:19:31 INFO mapred.JobClient: Failed map tasks=1
14/03/12 19:19:31 INFO mapred.JobClient: Launched map tasks=4
14/03/12 19:19:31 INFO mapred.JobClient: Total time spent by all maps in occupied slots (ms)=33406
14/03/12 19:19:31 INFO mapred.JobClient: Total time spent by all reduces in occupied slots (ms)=0
14/03/12 19:19:31 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
14/03/12 19:19:31 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
14/03/12 19:19:31 WARN mapreduce.Counters: Group FileSystemCounters is deprecated. Use org.apache.hadoop.mapreduce.FileSystemCounter instead
14/03/12 19:19:31 INFO mapreduce.ImportJobBase: Transferred 0 bytes in 355.1399 seconds (0 bytes/sec)
14/03/12 19:19:31 WARN mapreduce.Counters: Group org.apache.hadoop.mapred.Task$Counter is deprecated. Use org.apache.hadoop.mapreduce.TaskCounter instead
14/03/12 19:19:31 INFO mapreduce.ImportJobBase: Retrieved 0 records.
14/03/12 19:19:31 ERROR tool.ImportTool: Error during import: Import job failed!
誰能告訴我,纔有可能使用一個SQOOP從SQL Server移動一氣呵成蜂巢?寫的會很棒,或者按日期劃分等等。
或者錯誤是什麼意思? 謝謝!