2016-09-22 24 views
3

我有一個熊貓數據幀包含含有密鑰的字典一種含列的單元:值對,例如:解析在大熊貓數據幀單元的字典爲新的行的細胞(新列)

{"name":"Test Thorton","company":"Test Group","address":"10850 Test #325\r\n","city":"Test City","state_province":"CA","postal_code":"95670","country":"USA","email_address":"[email protected]","phone_number":"999-888-3333","equipment_description":"I'm a big red truck\r\n\r\nRSN# 0000","response_desired":"week","response_method":"email"} 

我「M試圖解析字典,因此產生的數據幀包含每個鍵的新列和行填充每一列的結果值,就像這樣:

//Before 

1 2 3 4 5 
a b c d {6:y, 7:v} 

//After 

1 2 3 4 5   6 7 
a b c d {6:y, 7:v} y v 

建議大加讚賞。

回答

3

我認爲你可以使用concat

df = pd.DataFrame({1:['a','h'],2:['b','h'], 5:[{6:'y', 7:'v'},{6:'u', 7:'t'}] }) 

print (df) 
    1 2     5 
0 a b {6: 'y', 7: 'v'} 
1 h h {6: 'u', 7: 't'} 

print (df.loc[:,5].values.tolist()) 
[{6: 'y', 7: 'v'}, {6: 'u', 7: 't'}] 

df1 = pd.DataFrame(df.loc[:,5].values.tolist()) 
print (df1) 
    6 7 
0 y v 
1 u t 

print (pd.concat([df, df1], axis=1)) 
    1 2     5 6 7 
0 a b {6: 'y', 7: 'v'} y v 
1 h h {6: 'u', 7: 't'} u t 

時序len(df)=2k):

In [2]: %timeit (pd.concat([df, pd.DataFrame(df.loc[:,5].values.tolist())], axis=1)) 
100 loops, best of 3: 2.99 ms per loop 

In [3]: %timeit (pir(df)) 
1 loop, best of 3: 625 ms per loop 

df = pd.concat([df]*1000).reset_index(drop=True) 

print (pd.concat([df, pd.DataFrame(df.loc[:,5].values.tolist())], axis=1)) 


def pir(df): 
    df[['F', 'G']] = df[5].apply(pd.Series) 
    df.drop(5, axis=1) 
    return df 

print (pir(df))  
3

考慮df

df = pd.DataFrame([ 
     ['a', 'b', 'c', 'd', dict(F='y', G='v')], 
     ['a', 'b', 'c', 'd', dict(F='y', G='v')], 
    ], columns=list('ABCDE')) 

df 

enter image description here


使用apply(pd.Series)

df.E.apply(pd.Series) 

enter image description here


指定像這樣

df[['F', 'G']] = df.E.apply(pd.Series) 
df.drop('E', axis=1) 

enter image description here