用boto3完整掃描dynamoDb

我的表格大約是220MB，其中有250k條記錄。我試圖把所有這些數據都放到python中。我意識到這需要分塊批處理並循環通過，但我不確定如何將批處理設置爲開始之前離開的位置。用boto3完整掃描dynamoDb

有什麼方法可以過濾我的掃描嗎？從我讀的過濾發生在加載和加載停止在1MB，所以我實際上不能夠掃描新的對象。

任何援助將不勝感激。

import boto3 
dynamodb = boto3.resource('dynamodb', 
    aws_session_token = aws_session_token, 
    aws_access_key_id = aws_access_key_id, 
    aws_secret_access_key = aws_secret_access_key, 
    region_name = region 
    ) 

table = dynamodb.Table('widgetsTableName') 

data = table.scan()

來源

2016-04-21 CJ_Spaz

-1

原來，Boto3捕獲「LastEvaluatedKey」作爲返回響應的一部分。這可以被用來作爲起始點掃描：

data= table.scan(
    ExclusiveStartKey=data['LastEvaluatedKey'] 
)

我計劃建立解決此一循環，直到返回的數據僅是ExclusiveStartKey

來源

2016-04-21 21:36:42

boto3提供了處理所有分頁細節爲你paginators 。 Here是掃描分頁器的文檔頁面。基本上，你會這樣使用它：

import boto3 

client = boto3.client('dynamodb') 
paginator = client.get_paginator('scan') 

for page in paginator.paginate(): 
    # do something

來源

2016-04-21 23:02:39

注意，在'頁[」項目Items']'可能不是您所期望的：由於此分頁程序非常通用，因此每個DynamoDB項目都會返回一個格式類型的字典：value，例如'{'myAttribute'：{'M'：{}}，'yourAttribute'：{'N'：u'132457'}}'對於一個空映射和一個數字類型的行需要被強制轉換;我建議'decimal.Decimal'，因爲它已經接受了一個字符串並且會處理非整數）。其他類型，例如字符串，地圖和布爾值，由boto轉換爲它們的Python類型。 – kungphu

是否有掃描過濾器或filterexpression與分頁？ – vnpnlz

paginators會很棒，如果它不是針對@kungphu提出的問題。我沒有看到用它來做一件有用的事情，但是通過用無關的元數據污染響應數據來抵消它。 –

代碼刪除dynamodb格式類型爲@ kungphu提到。

import boto3 

from boto3.dynamodb.types import TypeDeserializer 
from boto3.dynamodb.transform import TransformationInjector 

client = boto3.client('dynamodb') 
paginator = client.get_paginator('query') 
service_model = client._service_model.operation_model('Query') 
trans = TransformationInjector(deserializer = TypeDeserializer()) 
for page in paginator.paginate(): 
    trans.inject_attribute_value_output(page, service_model)

來源

2016-07-21 07:41:07 Vincent

Bravo！否定了我之前關於paginators缺乏實用性的評論。謝謝！爲什麼這不是默認行爲？ –

我認爲Amazon DynamoDB documentation關於表掃描回答你的問題。

總之，您需要檢查LastEvaluatedKey的響應。下面是使用你的代碼的例子：

import boto3 
dynamodb = boto3.resource('dynamodb', 
          aws_session_token=aws_session_token, 
          aws_access_key_id=aws_access_key_id, 
          aws_secret_access_key=aws_secret_access_key, 
          region_name=region 
) 

table = dynamodb.Table('widgetsTableName') 

response = table.scan() 
data = response['Items'] 

while 'LastEvaluatedKey' in response: 
    response = table.scan(ExclusiveStartKey=response['LastEvaluatedKey']) 
    data.extend(response['Items'])

來源

2016-07-27 17:18:11

儘管這可能有效，請注意[boto3文檔]（http://boto3.readthedocs.io/en/latest/reference/services/dynamodb.html#DynamoDB.Client.query）狀態_If LastEvaluatedKey爲空，則結果的「最後一頁」已被處理，並且沒有更多數據需要檢索。因此，我正在使用的測試是'while response.get（'LastEvaluatedKey'）'而不是'while'LastEvaluatedKey'in response'，僅僅因爲「是空的」並不一定意味着「不存在」，並且這在任何情況下都起作用。 – kungphu

paginator是遍歷查詢/掃描項目更方便的方式 – iuriisusuk

Riffing關閉佐敦菲利普斯的回答，這就是你要如何傳遞FilterExpression與分頁：

import boto3 

client = boto3.client('dynamodb') 
paginator = client.get_paginator('scan') 
operation_parameters = { 
    'TableName': 'foo', 
    'FilterExpression': 'bar > :x AND bar < :y', 
    'ExpressionAttributeValues': { 
    ':x': {'S': '2017-01-31T01:35'}, 
    ':y': {'S': '2017-01-31T02:08'}, 
    } 
} 

page_iterator = paginator.paginate(**operation_parameters) 
for page in page_iterator: 
    # do something

來源

2017-01-31 17:55:04

用boto3完整掃描dynamoDb

回答

相關問題