2017-05-10 69 views
0

我有一個名爲'資格'的列的數據框。 它具有像值:熊貓:使用正則表達式替換整個單元格的文本

b.tech       
graduate       
btech        
hsc        
degree        
12th pass       
pharm.d 2nd year     
b pharm       
pursuing b pharm     
ssc        
b.pharm       
mba        
bsc        
no         
student       
pharm.d 3rd year     
b.com        
bcom        
ug         
diploma       
b tech        

我想要與其他文本替換特定值的數據一致。 例如, b techb.techbachelors in X變成Graduate。或MastersM.Com等與Post Graduate。 我該如何使用正則表達式?

回答

1

你可以這樣來做:

to_replace = [r'SearchRegEx1', r'SearchRegEx2', ...] 
value = [r'ReplaceRegEx1', r'ReplaceRegEx2', ...] 

然後

df['col_name'] = df['col_name'].replace(to_replace, value, regex=True) 

演示:

In [124]: to_replace = [r'btech|b[\.\s]+\w+|bachelors\b.*', r'Masters|M.Com'] 
    ...: value = ['Graduate', 'Post Graduate'] 
    ...: 

In [125]: df['col'] = df['col'].replace(to_replace, value, regex=True) 

In [126]: df 
Out[126]: 
     col 
0 Graduate 
1 graduate 
2 Graduate 
3  hsc 
4  degree 
5  12th 
6 pharm.d 
7   b 
8 pursuing 
9  ssc 
10 Graduate 
11  mba 
12  bsc 
13  no 
14 student 
15 pharm.d 
16 Graduate 
17  bcom 
18  ug 
19 diploma 
20   b 
+0

感謝。我會嘗試一下。 –