2012-06-12 44 views
5

是否有更高效的方式在Amazon S3中列出存儲桶中的文件,併爲每個文件提取元數據?我正在使用AWS PHP SDK。從Amazon S3提取文件和元數據的高效方法?

if ($paths = $s3->get_object_list('my-bucket')) { 
    foreach($paths AS $path) { 
     $meta = $s3->get_object_metadata('my-bucket', $path); 
     echo $path . ' was modified on ' . $meta['LastModified'] . '<br />'; 
    } 
} 

此刻我需要運行get_object_list()列出所有的文件,然後爲每個文件來獲取其元數據get_object_metadata()

如果我的存儲桶中有100個文件,則會發出101個調用來獲取此數據。如果可以在1個電話中進行,這將是一件好事。

E.g:

if ($paths = $s3->get_object_list('my-bucket')) { 
    foreach($paths AS $path) { 
     echo $path['FileName'] . ' was modified on ' . $path['LastModified'] . '<br />'; 
    } 
} 
+1

使用s3對象來存儲'文件'就像使用整個2Gb fs分區來存儲您的Zork圖像。把你所有的元數據放在一個對象中。是的,100個對象需要100個交易。 – starbolin

回答

1

我結束了使用list_objects功能,掏出我所需要的上次更改元。

全部在一個電話:)

2

我知道這是有點老了,但我遇到了這個問題,並解決它,我擴展了AWS SDK使用的批處理功能,這類型的問題。它可以爲大量文件檢索自定義元數據更快。 這是我的代碼:

/** 
    * Name: Steves_Amazon_S3 
    * 
    * Extends the AmazonS3 class in order to create a function to 
    * more efficiently retrieve a list of 
    * files and their custom metadata using the CFBatchRequest function. 
    * 
    * 
    */ 
    class Steves_Amazon_S3 extends AmazonS3 { 

     public function get_object_metadata_batch($bucket, $filenames, $opt = null) { 
      $batch = new CFBatchRequest(); 

      foreach ($filenames as $filename) { 

       $this->batch($batch)->get_object_headers($bucket, $filename); // Get content-type 
      } 

      $response = $this->batch($batch)->send(); 

      // Fail if any requests were unsuccessful 
      if (!$response->areOK()) { 
       return false; 
      } 
      foreach ($response as $file) { 
       $temp = array(); 
       $temp['name'] = (string) basename($file->header['_info']['url']); 
       $temp['etag'] = (string) basename($file->header['etag']); 
       $temp['size'] = $this->util->size_readable((integer) basename($file->header['content-length'])); 
       $temp['size_raw'] = basename($file->header['content-length']); 
       $temp['last_modified'] = (string) date("jS M Y H:i:s", strtotime($file->header['last-modified'])); 
       $temp['last_modified_raw'] = strtotime($file->header['last-modified']); 
       @$temp['creator_id'] = (string) $file->header['x-amz-meta-creator']; 
       @$temp['client_view'] = (string) $file->header['x-amz-meta-client-view']; 
       @$temp['user_view'] = (string) $file->header['x-amz-meta-user-view']; 

       $result[] = $temp; 
      } 

      return $result; 
     } 
    } 
2

你需要知道list_objects功能有限制。它不允許加載超過1000個對象,即使max-keys選項將設置爲一些大數字。

要解決這個問題,你需要將數據加載幾次:

private function _getBucketObjects($prefix = '', $booOneLevelOny = false) 
{ 
    $objects = array(); 
    $lastKey = null; 
    do { 
     $args = array(); 
     if (isset($lastKey)) { 
      $args['marker'] = $lastKey; 
     } 

     if (strlen($prefix)) { 
      $args['prefix'] = $prefix; 
     } 

     if($booOneLevelOny) { 
      $args['delimiter'] = '/'; 
     } 

     $res = $this->_client->list_objects($this->_bucket, $args); 
     if (!$res->isOK()) { 
      return null; 
     } 

     foreach ($res->body->Contents as $object) { 
      $objects[] = $object; 
      $lastKey = (string)$object->Key; 
     } 
     $isTruncated = (string)$res->body->IsTruncated; 
     unset($res); 
    } while ($isTruncated == 'true'); 

    return $objects; 
} 

至於結果 - 你有對象的完整列表。


如果您有一些自定義標題,該怎麼辦? 他們將不會通過list_objects函數返回。在這種情況下,這將有所幫助:

foreach (array_chunk($arrObjects, 1000) as $object_set) { 
    $batch = new CFBatchRequest(); 
    foreach ($object_set as $object) { 
     if(!$this->isFolder((string)$object->Key)) { 
      $this->_client->batch($batch)->get_object_headers($this->_bucket, $this->preparePath((string)$object->Key)); 
     } 
    } 

    $response = $this->_client->batch($batch)->send(); 

    if ($response->areOK()) { 
     foreach ($response as $arrHeaderInfo) { 
      $arrHeaders[] = $arrHeaderInfo->header; 
     } 
    } 
    unset($batch, $response); 
} 
+0

這正是我所需要的。非常感謝! – CoreDumpError

+0

@CoreDumpError歡迎您:)我喜歡這個網站,因爲問題很有趣,而且答案可能非常不同且有用! – Andron