拆分大熊貓數據框嵌套列表進入新命名的列

我有以下形式的數據幀（DF）：拆分大熊貓數據框嵌套列表進入新命名的列

name alias col3 
mark david ['3109892828','[email protected]','123 main st'] 
john twixt ['5468392873','[email protected]','345 grand st']

什麼是COL3分裂成新的，名爲列的簡潔的方式？（可能使用Lambda和應用）

來源

2015-09-18 DNburtonguster

你可以申請一個加入到列表中的元素，使一個逗號分隔字符串，然後調用矢量化str.split與expand=True來創建新的列：

In [12]: 
df[['UserID', 'email', 'address']] = df['col3'].apply(','.join).str.split(expand=True) 
df 

Out[12]: 
    alias          col3 name \ 
0 david [3109892828, [email protected], 123 main st] mark 
1 twixt [5468392873, [email protected], 345 grand st] john 

          UserID email address 
0 3109892828,[email protected],123 main  st 
1 5468392873,[email protected],345 grand  st

一個清潔的方法將應用pd.Series構造函數，這將變成每個列表成系列：

In [15]: 
df[['UserID', 'email', 'address']] = df['col3'].apply(pd.Series) 
df 

Out[15]: 
    alias          col3 name  UserID \ 
0 david [3109892828, [email protected], 123 main st] mark 3109892828 
1 twixt [5468392873, [email protected], 345 grand st] john 5468392873 

      email  address 
0 [email protected] 123 main st 
1 [email protected] 345 grand st

來源

2015-09-18 15:01:36 EdChum

這可能會造成困難，如果「列」合法包含逗號......也許像'DF [ '身份證'， '郵件'， '地址'] = df.col3.apply（PD。系列）'然後放下'col3'？ –

嗯。誠然，除非OP在他們的數據中有這個數據，我不認爲這是一個問題，仍然應用系列ctor在這裏是更清潔和足夠的，將更新，謝謝 – EdChum

通常，這將是一個很好的解決方案，但它似乎我的數組沒有每行有相同數量的列..所以，如果嵌套列表的每個記錄的字段數不同，我該怎麼辦？這裏是我得到的錯誤：ValueError：列的長度必須與密鑰長度相同 – DNburtonguster

這是我想出了。它包含了一些對原始文件的清理以及對字典的轉換。

import pandas as pd 
with open('/path/to/file', 'rb') as f: 
    data = f.readlines() 

data = map(lambda x: x.split('}'), data) 
data_df = pd.DataFrame(data) 
data_dfn = data_df.transpose() 
data_new = data_dfn[0].map(lambda x: x.lstrip('[,{)').replace("'","").split(',')) 

s = pd.DataFrame(data_new) 
d = dict(data_new) 
D = pd.DataFrame(dict([ (k,pd.Series(v)) for k,v in d.iteritems() ])) 
D = D.transpose()

來源

2015-09-18 21:42:03 DNburtonguster

拆分大熊貓數據框嵌套列表進入新命名的列

回答

相關問題