來源DF:
In [172]: df
Out[172]:
id attributes attr2
0 255RSSSTCHL-QLTDGLZD-BLK {"color":"Black","hardware":"Goldtone"} {"aaa":"aaa", "bbb":"bbb"}
1 C3ACCRDNFLP-QLTDS-S-BLK {"size":"Small","color":"Black"} {"ccc":"ccc"}
解決方案1:
import ast
attr_cols = ['attributes','attr2']
def f(df, attr_col):
return df.join(df.pop(attr_col) \
.apply(lambda x: pd.Series(ast.literal_eval(x))))
for col in attr_cols:
df = f(df, col)
解決方案2:由於@DYZ for the hint:
import json
attr_cols = ['attributes','attr2']
def f(df, attr_col):
return df.join(df.pop(attr_col) \
.apply(lambda x: pd.Series(json.loads(x))))
for col in attr_cols:
df = f(df, col)
結果:
In [175]: df
Out[175]:
id color hardware size aaa bbb ccc
0 255RSSSTCHL-QLTDGLZD-BLK Black Goldtone NaN aaa bbb NaN
1 C3ACCRDNFLP-QLTDS-S-BLK Black NaN Small NaN NaN ccc
時間:爲20.000行DF:
In [198]: df = pd.concat([df] * 10**4, ignore_index=True)
In [199]: df.shape
Out[199]: (20000, 3)
In [201]: %paste
def f_ast(df, attr_col):
return df.join(df.pop(attr_col) \
.apply(lambda x: pd.Series(ast.literal_eval(x))))
def f_json(df, attr_col):
return df.join(df.pop(attr_col) \
.apply(lambda x: pd.Series(json.loads(x))))
## -- End pasted text --
In [202]: %%timeit
...: for col in attr_cols:
...: f_ast(df.copy(), col)
...:
1 loop, best of 3: 33.1 s per loop
In [203]:
In [203]: %%timeit
...: for col in attr_cols:
...: f_json(df.copy(), col)
...:
1 loop, best of 3: 30 s per loop
In [204]: df.shape
Out[204]: (20000, 3)
如果字典也是有效的JSON對象,那麼'json.loads'大約比'ast.literal_eval'快5%。 – DyZ
@DYZ,我增加了一個時間 - 對於那個DF它快了10%;) – MaxU