2011-01-31 53 views
2

我想從EMR本地文件系統上傳一個目錄到s3作爲壓縮文件。將Elastic MapReduce中的壓縮文件上傳到S3

有沒有比我目前使用的方法更好的方法來解決這個問題?

是否可以將ZipOutputStream作爲Reducer輸出返回?

感謝

zipFolderAndUpload("target", "target.zip", "s3n://bucketpath/"); 


static public void zipFolderAndUpload(String srcFolder, String zipFile, String dst) throws Exception { 

    //Zips a directory 
    FileOutputStream fileWriter = new FileOutputStream(zipFile); 
    ZipOutputStream zip = new ZipOutputStream(fileWriter); 
    addFolderToZip("", srcFolder, zip); 
    zip.flush(); 
    zip.close(); 

    // Copies the zipped file to the s3 filesystem, 
    InputStream in = new BufferedInputStream(new FileInputStream(zipFile)); 
    Configuration conf = new Configuration(); 
    FileSystem fs = FileSystem.get(URI.create(dst+zip), conf); 
    OutputStream out = fs.create(new Path(dst+zip)); 
    IOUtils.copyBytes(in, out, 4096, true); 

} 

static private void addFileToZip(String path, String srcFile, ZipOutputStream zip) throws Exception { 

    File folder = new File(srcFile); 
    if (folder.isDirectory()) { 
     addFolderToZip(path, srcFile, zip); 
    } else { 
     byte[] buf = new byte[1024]; 
     int len; 
     FileInputStream in = new FileInputStream(srcFile); 
     zip.putNextEntry(new ZipEntry(path + "/" + folder.getName())); 
     while ((len = in.read(buf)) > 0) { 
      zip.write(buf, 0, len); 
     } 
    } 
} 

static private void addFolderToZip(String path, String srcFolder, ZipOutputStream zip) throws Exception { 
    File folder = new File(srcFolder); 

    for (String fileName : folder.list()) { 
     if (path.equals("")) { 
      addFileToZip(folder.getName(), srcFolder + "/" + fileName, zip); 
     } else { 
      addFileToZip(path + "/" + folder.getName(), srcFolder + "/" + fileName, zip); 
     } 
    } 
} 

回答

4

你正在服用的方法看起來不錯。如果您發現它太慢是因爲它是單線程的,那麼您可以創建自己的Hadoop OutputFormat實現,該實現寫入壓縮文件。

需要注意的一件事是,Java SE的ZipOutputFormat實現不支持Zip64,這意味着它不支持大於4GB的ZIP文件。還有其他的ZIP實現,比如TrueZIP。

+0

多數民衆贊成在偉大的感謝提示。 – patrickandroid 2011-02-10 08:39:52