更改熊貓數據框中列的文本

我發現這個看似簡單的操作相當困難。我有一個數據框有一個名爲CompanyId的列。它的值是'COMP23'，'COMP55'等..現在，當我想刪除前綴'COMP'並將其設爲數字時，它會擊敗我。這是我在做什麼：更改熊貓數據框中列的文本

df['companyId'] = df['companyId'].astype('str') # because type was 'object'. 

df['companyId'].map(lambda x: int(x[4:]))

我在哪裏錯了？我注意到df是一個系列對象。

Try：

df['companyId'] = df['companyId'].map(lambda x: int(str(x)[4:]))

2016-05-09 22:50:39 piRSquared

您可以使用正則表達式來提取所有數字(\d+)。

>>> df.CompanyId.str.extract(r'(\d+)') 
0 23 
1 55 
Name: CompanyId, dtype: object

請注意，您的原始方法正常工作。

>>> df['CompanyId'].astype('str').map(lambda x: int(x[4:])) 
0 23 
1 55 
Name: CompanyId, dtype: int64

如果出現錯誤，可能是因爲數據有問題。

df = pd.DataFrame({'CompanyId': ['COMP23', 'COMP55', 'COMP', '', 'COM55']})  
df['CompanyId'].astype('str').map(lambda x: int(x[4:]))

ValueError: invalid literal for int() with base 10: ''

注意，正則表達式模式仍然提取正確的價值觀：

>>> df.CompanyId.str.extract(r'(\d+)') 
0  23 
1  55 
2 NaN 
3 NaN 
4  55

2016-05-09 22:50:14 Alexander

試試這個：

In [210]: df['companyId'].str.replace('COMP','').astype(int) 
Out[210]: 
0  23 
1  55 
2 101 
Name: companyId, dtype: int32

或

In [207]: df.companyId.str[4:].astype(int) 
Out[207]: 
0  23 
1  55 
2 101 
Name: companyId, dtype: int32

2016-05-09 22:50:19 MaxU

回答