2014-03-13 27 views
0

我需要創建需要連接位於SQL Server中的多個表的Hive。使用Sqoop將MS SQL Server表遷移到Hive

我做了什麼:

我使用Sqoop將表/ queryresults進入HDFS也沒問題。然後,我在Hive中創建一個外部表以將表指向HDFS文件夾。另外,爲了使表信息保持最新,我需要在crontab上添加一些內容,以刪除HDFS內容並再次調用相同的sqoop命令。

上述解決方案的工作原理,但它確實是一個PITA維護。

我必須寫一個蜂巢創建外部表查詢它就像一個時間的工作,我知道,但你需要每列

我看到一個帖子here,在那裏好像SQOOP可移動記數據從SQL服務器直接進入Hive而不關心列的細節。

我嘗試這樣做:

sqoop import --driver 'net.sourceforge.jtds.jdbc.Driver' 
--connect 'jdbc:jtds:sqlserver:/ip:1433/db;user=user;password=password' 
--direct 
--table "[db\$Shipment Cross Reference]" 
--hive-import 
--direct 
--hive-overwrite 
--create-hive-table 
--hive-table shipment_cross_reference 
--fields-terminated-by '\001' 
--lines-terminated-by '\n' -m 1 

但是,它出錯了:

14/03/12 19:19:31 INFO mapred.JobClient: Job complete: job_201403051725_0059 
14/03/12 19:19:31 INFO mapred.JobClient: Counters: 6 
14/03/12 19:19:31 INFO mapred.JobClient: Job Counters 
14/03/12 19:19:31 INFO mapred.JobClient:  Failed map tasks=1 
14/03/12 19:19:31 INFO mapred.JobClient:  Launched map tasks=4 
14/03/12 19:19:31 INFO mapred.JobClient:  Total time spent by all maps in occupied slots (ms)=33406 
14/03/12 19:19:31 INFO mapred.JobClient:  Total time spent by all reduces in occupied slots (ms)=0 
14/03/12 19:19:31 INFO mapred.JobClient:  Total time spent by all maps waiting after reserving slots (ms)=0 
14/03/12 19:19:31 INFO mapred.JobClient:  Total time spent by all reduces waiting after reserving slots (ms)=0 
14/03/12 19:19:31 WARN mapreduce.Counters: Group FileSystemCounters is deprecated. Use org.apache.hadoop.mapreduce.FileSystemCounter instead 
14/03/12 19:19:31 INFO mapreduce.ImportJobBase: Transferred 0 bytes in 355.1399 seconds (0 bytes/sec) 
14/03/12 19:19:31 WARN mapreduce.Counters: Group org.apache.hadoop.mapred.Task$Counter is deprecated. Use org.apache.hadoop.mapreduce.TaskCounter instead 
14/03/12 19:19:31 INFO mapreduce.ImportJobBase: Retrieved 0 records. 
14/03/12 19:19:31 ERROR tool.ImportTool: Error during import: Import job failed! 

誰能告訴我,纔有可能使用一個SQOOP從SQL Server移動一氣呵成蜂巢?寫的會很棒,或者按日期劃分等等。

或者錯誤是什麼意思? 謝謝!

回答

0

請分享使用參數--verbose和失敗的Map任務日誌生成的整個Sqoop輸出。

Sqoop內置支持Microsoft SQL Server,所以我想知道是否有任何理由你不使用它,而是使用jtds驅動程序?