0
我們有一些代碼將一堆S3文件下載到本地目錄。要檢索的文件列表來自我們運行的查詢。它只列出實際存在於我們S3存儲桶中的文件。AWS S3返回404返回的文件當然在那裏存在
當我們循環檢索這些文件時,其中約10%返回404錯誤,就好像該文件不存在一樣。我註銷了那個文件的名稱/位置,所以我可以去S3並檢查,並且確定每一個IS ON S3在我們去找的位置。
爲什麼S3在文件存在時拋出404?
這是腳本的Groovy代碼。
class RetrieveS3FilesFromCSVLoader implements Loader {
private static String missingFilesFile = "00-MISSED_FILES.csv"
private static String csvFileName = "/csv/s3file2.csv"
private static String saveFilesToLocation = "/tmp/retrieve/"
public static final char SEPARATOR = ','
@Autowired
DocumentFileService documentFileService
private void readWithCommaSeparatorSQL() {
int counter = 0
String fileName
String fileLocation
File missedFiles = new File(saveFilesToLocation + missingFilesFile)
PrintWriter writer = new PrintWriter(missedFiles)
File fileCSV = new File(getClass().getResource(csvFileName).toURI())
fileCSV.splitEachLine(SEPARATOR as String) { nextLine ->
//if (counter < 15) {
if (nextLine != null && (nextLine[0] != 'FileLocation')) {
counter++
try {
//Remove 0, only if client number start with "0".
fileLocation = nextLine[0].trim()
byte[] fileBytes = documentFileService.getFile(fileLocation)
if (fileBytes != null) {
fileName = fileLocation.substring(fileLocation.indexOf("/") + 1, fileLocation.length())
File file = new File(saveFilesToLocation + fileName)
file.withOutputStream {
it.write fileBytes
}
println "$counter) Wrote file ${fileLocation} to ${saveFilesToLocation + fileLocation}"
} else {
println "$counter) UNABLE TO RETRIEVE FILE ELSE: $fileLocation"
writer.println(fileLocation)
}
} catch (Exception e) {
println "$counter) UNABLE TO RETRIEVE FILE: $fileLocation"
println(e.getMessage())
writer.println(fileLocation)
}
} else {
counter++;
}
//}
}
writer.close()
}
以下是getFile(fileLocation)和客戶端創建的代碼。上述
public byte[] getFile(String filename) throws IOException {
AmazonS3Client s3Client = connectToAmazonS3Service();
S3Object object = s3Client.getObject(S3_BUCKET_NAME, filename);
if(object == null) {
return null;
}
byte[] fileAsArray = IOUtils.toByteArray(object.getObjectContent());
object.close();
return fileAsArray;
}
/**
* Connects to Amazon S3
*
* @return instance of AmazonS3Client
*/
private AmazonS3Client connectToAmazonS3Service() {
AWSCredentials credentials;
try {
credentials = new BasicAWSCredentials(S3_ACCESS_KEY_ID, S3_SECRET_ACCESS_KEY);
} catch (Exception e) {
throw new AmazonClientException(
"Cannot load the credentials from the credential profiles file. " +
"Please make sure that your credentials file is at the correct " +
"location (~/.aws/credentials), and is in valid format.",
e);
}
AmazonS3Client s3 = new AmazonS3Client(credentials);
Region usWest2 = Region.getRegion(Regions.US_EAST_1);
s3.setRegion(usWest2);
return s3;
}
代碼工作在傳遞給腳本列表中的文件的90%,但我們知道與事實的文件的全部100%的S3與我們傳遞位置字符串存在。
是最近'通過另一個進程將'PUT'到文件中的文件 - 或者是列表命令檢索存在於存儲區中一段時間的文件,在這些文件中它們不會受到讀寫後可見性的影響? –
這些在S3中已經有好幾個月了。而在我們的實際應用程序中,我可以獲取文件。因此,具有該問題的下載程序腳本使用與我們的應用程序中使用的單個文件預覽程序相同的下載代碼。 – user1567291
你可以發佈一些循環和'AmazonS3Client'用法的代碼嗎? –