2016-03-17 63 views
7

我正試圖清理一些我從Excel文件中獲得的數據。該文件包含7400行和18列,其中包括客戶列表及其各自的地址和其他數據。我遇到的問題是一些城市拼錯,扭曲了信息,並使其難以進一步處理。熊貓一次取代多個值

SURNAME | ADDRESS   | CITY 
0 Jenson | 252 Des Chênes | D.DO 
1 Jean  | 236 Gouin  | DOLLARD 
2 Denis  | 993 Boul. Gouin | DOLLARD-DES-ORMEAUX 
3 Bradford | 1690 Dollard #7 | DDO 
4 Alisson | 115 Du Buisson | IL PERROT 
5 Abdul  | 9877 Boul. Gouin | Pierrefonds 
6 O'Neil | 5 Du College  | Ile Bizard 
7 Bundy  | 7345 Sherbrooke | ILLE Perot 
8 Darcy  | 8671 Anthony #2 | ILE Perrot 
9 Adams  | 845 Georges  | Pierrefonds 

在上面的例子D.DO,多拉德,DDO應拼寫多拉德-DES-Ormeaux的和IL PERROT,ILLE PEROT,ILE PERROT應拼寫ILE-PERROT。

我已經能夠使用替換值:

df["CITY"].replace(to_replace={"D.DO", "DOLLARD", "DDO"}, value="DOLLARD-DES-ORMEAUX", regex=True) 
df["CITY"].replace(to_replace={"IL PERROT", "ILLE PEROT", "ILE PERROT"}, value="ILE-PERROT", regex=True) 

有上述操作組合成一個的一些方法? 我已經試過:

df["CITY"].replace({to_replace={"D.DO", "DOLLARD", "DDO"}, value="DOLLARD-DES-ORMEAUX", to_replace={"IL PERROT", "ILLE PEROT", "ILE PERROT"}, value="ILE-PERROT"}, regex=True) 

,但我沒有運氣

回答

8

嘗試.replace({}, regex=True)方法:

replacements = { 
    'CITY': { 
     r'(D.*DO|DOLLARD.*)': 'DOLLARD-DES-ORMEAUX', 
     r'I[lL]*[eE]*.*': 'ILLE Perot'} 
} 

df.replace(replacements, regex=True, inplace=True) 

print(df) 

輸出:

SURNAME   ADDRESS     CITY 
0 Jenson 252 Des Chênes DOLLARD-DES-ORMEAUX 
1  Jean   236 Gouin DOLLARD-DES-ORMEAUX 
2  Denis 993 Boul. Gouin DOLLARD-DES-ORMEAUX 
3 Bradford 1690 Dollard #7 DOLLARD-DES-ORMEAUX 
4 Alisson 115 Du Buisson   ILLE Perot 
5  Abdul 9877 Boul. Gouin   Pierrefonds 
6 O'Neil  5 Du College   ILLE Perot 
7  Bundy 7345 Sherbrooke   ILLE Perot 
8  Darcy 8671 Anthony #2   ILLE Perot 
9  Adams  845 Georges   Pierrefonds 
2

您可以創建的字典替換,然後遍歷它們,使用'loc'進行替換。

target_for_values = { 
    'DOLLARD-DES-ORMEAUX': ['D.DO', 'DOLLARD', 'DDO'], 
    'ILE-PERROT': ['IL PERROT', 'ILLE PEROT', 'ILE PERROT']} 

for k, v in target_for_values.iteritems(): 
    df.loc[df.CITY.str.upper().isin(v), 'CITY'] = k 

>>> df.CITY 
        CITY 
0     C.DO 
1 DOLLARD-DES-ORMEAUX 
2 DOLLARD-DES-ORMEAUX 
3 DOLLARD-DES-ORMEAUX 
4   ILE-PERROT 
5   Pierrefonds 
6   Ile Bizard 
7   ILE-PERROT 
8   ILE-PERROT 
9   Pierrefonds