2017-03-01 71 views
2

我有一個熊貓的數據幀與列「A」熊貓拆分數據幀一列,得到標題

dfc = pd.DataFrame({"A": ['AB=0.246154;ABP=39.3908;AC=3', 'AB=0.3;ABP=9.95901;AC=2;AF=0.333333', 'AB=0;ABP=0;AC=6;AF=1;AN=6;AO=86', 'AB=0.461538;ABP=3.51141;AC=2']}) 

我想拆塔「A」的數據幀,並獲得新的數據-frame等,

A AB ABP AC AF AN AO 
0 AB=0.246154;ABP=39.3908;AC=3 0.246154 39.3908 3 None None None 
1 AB=0.3;ABP=9.95901;AC=2;AF=0.333333 0.3 9.95901 2 0.333333 None None 
2 AB=0;ABP=0;AC=6;AF=1;AN=6;AO=86 0 0 6 1 6 86 
3 AB=0.461538;ABP=3.51141;AC=2 0.461538 3.51141 2 None None None 

我試圖使用分割數據幀列,

dfc.A.str.split(';', expand = True) 

但它提供了新的數據幀等,

   0   1  2   3  4  5 
0 AB=0.246154 ABP=39.3908 AC=3   None None None 
1  AB=0.3 ABP=9.95901 AC=2 AF=0.333333 None None 
2   AB=0  ABP=0 AC=6   AF=1 AN=6 AO=86 
3 AB=0.461538 ABP=3.51141 AC=2   None None None 

如何將標題添加到列中「=」之前的文本並將此新數據框添加到原始數據框? 是否有Pythonic方式在一行中執行這兩個操作?

由於

回答

2

使用extractall

e = dfc.A.str.extractall('([^;]+)=([^;]+)') 
pd.Series(e.values[:, 1], [e.index.get_level_values(0), e.values[:, 0]]).unstack() 

     AB  ABP AC  AF AN AO 
0 0.246154 39.3908 3  None None None 
1  0.3 9.95901 2 0.333333 None None 
2   0  0 6   1  6 86 
3 0.461538 3.51141 2  None None None 
0

這應該工作:

d = {"A": ['AB=0.246154;ABP=39.3908;AC=3', 'AB=0.3;ABP=9.95901;AC=2;AF=0.333333', 'AB=0;ABP=0;AC=6;AF=1;AN=6;AO=86', 'AB=0.461538;ABP=3.51141;AC=2']} 
rows = [s.split(";") for s in d["A"]] 
data = [dict(cell.split('=') for cell in row) for row in rows] 

df = pd.DataFrame(data) 
print (df) 

d = {"A": ['AB=0.246154;ABP=39.3908;AC=3', 'AB=0.3;ABP=9.95901;AC=2;AF=0.333333', 'AB=0;ABP=0;AC=6;AF=1;AN=6;AO=86', 'AB=0.461538;ABP=3.51141;AC=2']} 
dfc = pd.DataFrame(d) 

f = lambda s : dict(cell.split('=') for cell in s.split(';')) 
df = pd.DataFrame(dfc.A.apply(f).tolist()) 
print (df) 

輸出:

  AB  ABP AC  AF AN AO 
0 0.246154 39.3908 3  NaN NaN NaN 
1  0.3 9.95901 2 0.333333 NaN NaN 
2   0  0 6   1 6 86 
3 0.461538 3.51141 2  NaN NaN NaN 
4

嘗試下文中,構造一個系列/字典中的每個元素科拉姆N A適當地分割後的字符串,索引/鍵將成爲結果(使用pd.concat來連接的原始列A與新的數據幀,如果需要的話)的報頭:

dfc.A.apply(lambda x: pd.Series(dict(s.split("=") for s in x.split(";")))) 

#   AB  ABP AC  AF  AN AO 
#0 0.246154 39.3908 3  NaN NaN NaN 
#1  0.3 9.95901 2 0.333333 NaN NaN 
#2   0   0 6   1  6 86 
#3 0.461538 3.51141 2  NaN NaN NaN 
0
def spliter(data): 
    pairs = [x.split("=") for x in data.split(";")] 
    return pd.Series({key: val for key, val in pairs}) 


dfc.A.apply(spliter) 


     AB  ABP AC  AF AN AO 
0 0.246154 39.3908 3  NaN NaN NaN 
1  0.3 9.95901 2 0.333333 NaN NaN 
2   0  0 6   1 6 86 
3 0.461538 3.51141 2  NaN NaN NaN