2016-05-04 62 views
2

我想分割成幾列的字符串。例如,我想從下面的數據框的col2,col3和col5中選擇一些信息(但實際上我有超過一百個列來這樣做)。使用熊貓分割幾列

d = pd.DataFrame({ 
        'col1' : ['USA', 'AGN'], 
        'col2' : ['0|0:0.014:0.986,0.013,0', '1|0:0.02:1.936,0.023,1'], 
        'col3' : ['1|0:0.024:0.9,0.01345,2', '0|2:0.213:0.92,0.1,2'], 
        'col4' : ['done', 'done'], 
        'col5' : ['2|0:0.02:1.936,0.023,1', '1|0:0.024:0.9,0.01345,2'] 
        }) 

    col1      col2      col3 col4 ..... 
0 USA 0|0:0.014:0.986,0.013,0 1|0:0.024:0.9,0.01345,2 done ..... 
1 AGN 1|0:0.02:1.936,0.023,1  0|2:0.213:0.92,0.1,2 done ..... 

我只需要第一3馬克從長字符串。然後,我希望從下面的結果中可以看出。

col1 col2 col3 col4 col5 .... 
USA 0|0 1|0 done 2|0 .... 
AGN 1|0 0|2 done 1|0 .... 

有什麼提示嗎?

回答

2

如果我理解正確你的問題,你可以這樣來做:

In [254]: d.replace(r':.*', '', regex=True) 
Out[254]: 
    col1 col2 col3 col4 col5 
0 USA 0|0 1|0 done 2|0 
1 AGN 1|0 0|2 done 1|0 
1

爲了得到前三個字符串中的字符:

>>> d.col2.str[:3] 
0 0|0 
1 1|0 
Name: col2, dtype: object 

拆就「:」和採取的第一個項目:

>>> d.col2.str.split(':', expand=True)[0] 
0 0|0 
1 1|0 
Name: 0, dtype: object 

將它應用到一組列:

cols = ['col2', 'col3', 'col5'] 
d.loc[:, cols] = d.loc[:, cols].apply(lambda s: s.str[:3]) 

>>> d 
    col1 col2 col3 col4 col5 
0 USA 0|0 1|0 done 2|0 
1 AGN 1|0 0|2 done 1|0 
+0

感謝您的清晰解釋。 :) – Sakura