在使用熊貓導入CSV文件時有效地清理數據

我正在導入一個數據集與Python的熊貓，不幸需要一些清潔。導入後，我需要刪除兩列中的所有引號和空格（alpha2和alpha3）。這是目前我如何做到這一點：在使用熊貓導入CSV文件時有效地清理數據

# Add alpha2 country codes to custom dataset to normalize data 
country_codes = pd.read_csv('datasets/country_codes.csv').rename(columns = {'Alpha-2 code':'alpha2', 'Alpha-3 code':'alpha3'}) 
# Remove commas and spaces from dataset 
country_codes['alpha2'] = country_codes['alpha2'].str.replace('"', '') 
country_codes['alpha2'] = country_codes['alpha2'].str.replace(' ', '') 
country_codes['alpha3'] = country_codes['alpha3'].str.replace('"', '') 
country_codes['alpha3'] = country_codes['alpha3'].str.replace(' ', '')

在我oppinion，這是一個有點難看，因爲我需要一些簡單的命令5條規則。這可以通過更少的代碼更有效地完成嗎？

來源

2017-09-24 hY8vVpf3tyR57Xib

可以使用df.replace與regex如下：

country_codes[['alpha2', 'alpha3']].replace(r'"|\s','', 
               regex=True, 
               inplace=True)

完整的代碼如下所示：

country_codes = pd.read_csv('datasets/country_codes.csv').rename(columns = {'Alpha-2 code': 'alpha2', 'Alpha-3 code':'alpha3'}) 
country_codes[['alpha2', 'alpha3']].replace(r'"|\s','', 
              regex=True, 
              inplace=True)

然而，正如@Jeff在下面的評論refered ，最好不要使用inplace=True，而應該這樣做：

country_codes[['alpha2', 'alpha3']] = country_codes[['alpha2', 'alpha3']].replace(r'"|\s','', 
               regex=True)

有關更多詳細信息，請參閱文檔here。

來源

2017-09-24 18:37:08 MedAli

在鏈式表達式中使用inplace = True是不慣用的，它可能僅在有時;而只是簡單地返回新的值 – Jeff

在使用熊貓導入CSV文件時有效地清理數據

回答

相關問題