使用python的大熊貓來處理aws dynamodb數據

我從dynamodb表中獲取數據，使用boto3 for python 2.7，我會使用熊貓來對數據進行分組和排序。使用python的大熊貓來處理aws dynamodb數據

不幸的是，dynamodb數據格式相當複雜。像這樣：

data = [{ 
     u 'permaname': { 
     u 'S': u 'facebook' 
     }, 
     u 'uuid': { 
     u 'S': u '4b873085-c995-4ce4-9325-cfc70fcd4040' 
     }, 
     u 'tags': { 
     u 'L': [] 
     }, 
     u 'type': { 
     u 'S': u 'xxxxxx' 
     }, 
     u 'createdOn': { 
     u 'N': u '1502099627' 
     }, 
     u 'source': { 
     u 'S': u 'xxxxxxx' 
     }, 
     u 'data': { 
     u 'NULL': True 
     }, 
     u 'crawler': { 
     u 'S': u 'xxxxxxx' 
     } 
    }, { 
     u 'permaname': { 
     u 'S': u 'facebook' 
     }, 
     u 'uuid': { 
     u 'S': u '25381aef-a7db-4b79-b599-89fd060fcf73' 
     }, 
     u 'tags': { 
     u 'L': [] 
     }, 
     u 'type': { 
     u 'S': u 'xxxxxxx' 
     }, 
     u 'createdOn': { 
     u 'N': u '1502096901' 
     }, 
     u 'source': { 
     u 'S': u 'xxxxxxx' 
     }, 
     u 'data': { 
     u 'NULL': True 
     }, 
     u 'crawler': { 
     u 'S': u 'xxxxxxx' 
     } 
    }]

要做我的小組和排序的東西，我必須創建一個熊貓物體，我不知道該怎麼做。

這是我正在努力：

obj = pandas.DataFrame(data) 
print list(obj.sort_values(['createdOn'],ascending=False).groupby('source'))

如果我打印的obj是這樣的：

print list(obj)

我：

[u'crawler 'U' createdOn'，u'data'，u'permaname'，u'source'，u'tags'， u'type'，u'uuid']

有人知道如何用dynamodb數據創建dataFrame obj嗎？

來源

2017-08-11 PAscalinox

要將dynamodb JSON轉換爲普通JSON，你可以使用這個庫：

https://github.com/Alonreznik/dynamodb-json

來源

2017-08-11 21:02:54

我會嘗試在Python 3

data = [{ 
     'permaname': { 
     'S': 'facebook' 
     }, 
     'uuid': { 
     'S': '4b873085-c995-4ce4-9325-cfc70fcd4040' 
     }, 
     'tags': { 
     'L': [] 
     }, 
     'type': { 
     'S': 'xxxxxx' 
     }, 
     'createdOn': { 
     'N': '1502099627' 
     }, 
     'source': { 
     'S': 'xxxxxxx' 
     }, 
     'data': { 
     'NULL': True 
     }, 
     'crawler': { 
     'S': 'xxxxxxx' 
     } 
    }, { 
     'permaname': { 
     'S': 'facebook' 
     }, 
     'uuid': { 
     'S': '25381aef-a7db-4b79-b599-89fd060fcf73' 
     }, 
    'tags': { 
     'L': [] 
     }, 
     'type': { 
     'S': 'xxxxxxx' 
     }, 
     'createdOn': { 
     'N': '1502096901' 
     }, 
     'source': { 
     'S': 'xxxxxxx' 
     }, 
     'data': { 
     'NULL': True 
     }, 
     'crawler': { 
     'S': 'xxxxxxx' 
     } 
    }]

回答使用dynamodb_json如先前建議。

from dynamodb_json import json_util as json 
obj = pd.DataFrame(json.loads(data)) 
obj

隨着輸出：

crawler  createdOn data permaname source tags type uuid 
0 xxxxxxx  1502099627 None facebook xxxxxxx  [] xxxxxx 4b873085-c995-4ce4-9325-cfc70fcd4040 
1 xxxxxxx  1502096901 None facebook xxxxxxx  [] xxxxxxx  25381aef-a7db-4b79-b599-89fd060fcf73

分組通過（我用MAX（），以總成績）

obj.sort_values(['createdOn'],ascending=False).groupby('source').max()

隨着輸出

 crawler createdOn data permaname tags type uuid 
source       
xxxxxxx  xxxxxxx  1502099627 NaN  facebook [] xxxxxxx  4b873085-c995-4ce4-9325-cfc70fcd4040

Printig列表

print(list(obj))

輸出：

[u'crawler', u'createdOn', u'data', u'permaname', u'source', u'tags', u'type', u'uuid']

我希望它能幫助。

來源

2018-03-06 14:27:18

使用python的大熊貓來處理aws dynamodb數據

回答

相關問題