2012-11-22 29 views
1

使用sqoop 1.3Sqoop導出到RDBMS .lzo。廣州文件超過64 MB裝載複製

試圖HDFS輸出導出到MySQL表

一切正常,同時裝載大小超過300MB的未壓縮的文件

但是,當加載75 MB或79 MB大小的壓縮文件(.gz和.lzo)時,我看到加載到表中的行加倍。當壓縮文件的大小不超過60MB(猜測與64 MB有關的塊大小)時,不會發生這種情況。有些操作我在上述背景下完成的:

bash-3.2$ ls -ltr 
-rw-r--r-- 1 bhargavn bhargavn 354844413 Nov 16 02:27 large_file 
-rw-rw-r-- 1 bhargavn bhargavn 15669507 Nov 21 03:41 small_file.lzo 
-rw-rw-r-- 1 bhargavn bhargavn 75173037 Nov 21 03:46 large_file.lzo 

bash-3.2$ wc -l large_file 
247060 large_file 

bash-3.2$ sqoop export --connect 'jdbc:mysql://db.com/test?zeroDateTimeBehavior=round& rewriteBatchedStatements=true' 
--table table_with_large_data 
--username sqoopuser 
--password sqoop 
--export-dir /user/bhargavn/workspace/data/sqoop-test/large_file.lzo 
--fields-terminated-by '\001' -m 1 
[21/11/2012:05:52:28 PST] main  INFO org.apache.hadoop.mapred.JobClient: map 0% reduce 0% 
[21/11/2012:05:57:03 PST] main  INFO com.cloudera.sqoop.mapreduce.ExportJobBase:  Transferred 143.3814 MB in 312.2832 seconds (470.1584 KB/sec) 
[21/11/2012:05:57:03 PST] main  INFO com.cloudera.sqoop.mapreduce.ExportJobBase:  Exported 494120 records. 

mysql> select count(1) from table_with_large_data; 
+----------+ 
| count(1) | 
+----------+ 
| 494120 | 
+----------+ 

mysql> truncate table_with_large_data; 
bash-3.2$ sqoop export --connect 'jdbc:mysql://db.com/test?zeroDateTimeBehavior=round& rewriteBatchedStatements=true' 
--table table_with_large_data 
--uername sqoopuser 
--password sqoop 
--export-dir /user/bhargavn/workspace/data/sqoop-test/large_file 
--fields-terminated-by '\001' 
-m 1 
[21/11/2012:06:05:35 PST] main  INFO org.apache.hadoop.mapred.JobClient: map 0%  reduce 0% 
[21/11/2012:06:08:06 PST] main  INFO org.apache.hadoop.mapred.JobClient: map 100%  reduce 0% 
[21/11/2012:06:08:06 PST] main  INFO com.cloudera.sqoop.mapreduce.ExportJobBase:  Transferred 338.4573 MB in 162.5891 seconds (2.0817 MB/sec) 
[21/11/2012:06:08:06 PST] main  INFO com.cloudera.sqoop.mapreduce.ExportJobBase:  Exported 247060 records. 
mysql> select count(1) from table_with_large_data; 
+----------+ 
| count(1) | 
+----------+ 
| 247060 | 
+----------+ 

回答