2013-10-05 124 views
1

我是熊貓的新手,我正努力將一列數據分成兩列。當然,我想分割' - '角色。我希望得到的列是'FICO.low''FICO.high'在大熊貓中分割一列

loansData['FICO.Range'][0:5] 

- 81174 --- 735-739 
- 99592 --- 715-719 
- 80059 --- 690-694 
- 15825 --- 695-699 
- 33182 --- 695-699 

Name: FICO.Range, dtype: object 

回答

5

使用extract(在即將到來的0.13版本中提供):

In [140]: s 
Out[140]: 
0 81174 --- 735-739 
1 99592 --- 715-719 
2 80059 --- 690-694 
3 15825 --- 695-699 
4 33182 --- 695-699 
Name: column, dtype: object 

In [141]: res = s.str.extract('(.+) --- (?P<FICO_low>.+)-(?P<FICO_high>.+)') 

In [142]: res 
Out[142]: 
     0 FICO_low FICO_high 
0 81174  735  739 
1 99592  715  719 
2 80059  690  694 
3 15825  695  699 
4 33182  695  699 

在舊版本的pandas你可以做這樣的:

In [22]: res = s.str.match('(.+) --- (.+)-(.+)') 

In [23]: res 
Out[23]: 
0 (81174, 735, 739) 
1 (99592, 715, 719) 
2 (80059, 690, 694) 
3 (15825, 695, 699) 
4 (33182, 695, 699) 
Name: column, dtype: object 

In [24]: df = DataFrame(map(list, res.values), columns=[0, 'FICO_low', 'FICO_high']) 

In [25]: df 
Out[25]: 
     0 FICO_low FICO_high 
0 81174  735  739 
1 99592  715  719 
2 80059  690  694 
3 15825  695  699 
4 33182  695  699 

如果你真的'.'重新列在列名中:

In [28]: df.rename(columns=lambda x: x.replace('_', '.') if isinstance(x, basestring) else x) 
Out[28]: 
     0 FICO.low FICO.high 
0 81174  735  739 
1 99592  715  719 
2 80059  690  694 
3 15825  695  699 
4 33182  695  699 

但你不能完成標籤它們了:(

FYI我打得有點朝三暮四與我在這裏的正則表達式,你可能要限制的組匹配的字符以'\d+'代替'.+'