2016-08-23 163 views
2

如何將熊貓數據框的單個列轉換爲字符串類型?在下面的房屋數據DF中,我需要將zipcode轉換爲字符串,以便當我運行線性迴歸時,將zipcode視爲分類而不是數字。謝謝!pandas dataframe將列類型轉換爲字符串或分類

df = pd.DataFrame({'zipcode': {17384: 98125, 2680: 98107, 722: 98005, 18754: 98109, 14554: 98155}, 'bathrooms': {17384: 1.5, 2680: 0.75, 722: 3.25, 18754: 1.0, 14554: 2.5}, 'sqft_lot': {17384: 1650, 2680: 3700, 722: 51836, 18754: 2640, 14554: 9603}, 'bedrooms': {17384: 2, 2680: 2, 722: 4, 18754: 2, 14554: 4}, 'sqft_living': {17384: 1430, 2680: 1440, 722: 4670, 18754: 1130, 14554: 3180}, 'floors': {17384: 3.0, 2680: 1.0, 722: 2.0, 18754: 1.0, 14554: 2.0}}) 
print (df) 
     bathrooms bedrooms floors sqft_living sqft_lot zipcode 
722   3.25   4  2.0   4670  51836 98005 
2680  0.75   2  1.0   1440  3700 98107 
14554  2.50   4  2.0   3180  9603 98155 
17384  1.50   2  3.0   1430  1650 98125 
18754  1.00   2  1.0   1130  2640 98109 

回答

0

要將列轉換爲字符串類型(這將是一個對象列大熊貓本身),使用astype

df.zipcode = zipcode.astype(str) 

如果你想獲得一個Categorical列,可在參數'category'傳遞給函數:

df.zipcode = zipcode.astype('category') 
+0

感謝您的回覆。當我嘗試這些方法(和其他)時,我得到了同樣的錯誤:'train_more_features ['zipcode'] = pd.Categorical(train_more_features.zipcode)''試圖在DataFrame的片段副本上設置一個值。 嘗試使用.loc [row_indexer,col_indexer] =值代替' – jklaus

5

你需要astype

df['zipcode'] = df.zipcode.astype(str) 
#df.zipcode = df.zipcode.astype(str) 

用於轉換到categorical

df['zipcode'] = df.zipcode.astype('category') 
#df.zipcode = df.zipcode.astype('category') 

另一種解決方案是Categorical

df['zipcode'] = pd.Categorical(df.zipcode) 

樣品與數據:

import pandas as pd 

df = pd.DataFrame({'zipcode': {17384: 98125, 2680: 98107, 722: 98005, 18754: 98109, 14554: 98155}, 'bathrooms': {17384: 1.5, 2680: 0.75, 722: 3.25, 18754: 1.0, 14554: 2.5}, 'sqft_lot': {17384: 1650, 2680: 3700, 722: 51836, 18754: 2640, 14554: 9603}, 'bedrooms': {17384: 2, 2680: 2, 722: 4, 18754: 2, 14554: 4}, 'sqft_living': {17384: 1430, 2680: 1440, 722: 4670, 18754: 1130, 14554: 3180}, 'floors': {17384: 3.0, 2680: 1.0, 722: 2.0, 18754: 1.0, 14554: 2.0}}) 
print (df) 
     bathrooms bedrooms floors sqft_living sqft_lot zipcode 
722   3.25   4  2.0   4670  51836 98005 
2680  0.75   2  1.0   1440  3700 98107 
14554  2.50   4  2.0   3180  9603 98155 
17384  1.50   2  3.0   1430  1650 98125 
18754  1.00   2  1.0   1130  2640 98109 

print (df.dtypes) 
bathrooms  float64 
bedrooms   int64 
floors   float64 
sqft_living  int64 
sqft_lot   int64 
zipcode   int64 
dtype: object 

df['zipcode'] = df.zipcode.astype('category') 

print (df) 
     bathrooms bedrooms floors sqft_living sqft_lot zipcode 
722   3.25   4  2.0   4670  51836 98005 
2680  0.75   2  1.0   1440  3700 98107 
14554  2.50   4  2.0   3180  9603 98155 
17384  1.50   2  3.0   1430  1650 98125 
18754  1.00   2  1.0   1130  2640 98109 

print (df.dtypes) 
bathrooms  float64 
bedrooms   int64 
floors   float64 
sqft_living  int64 
sqft_lot   int64 
zipcode  category 
dtype: object 
+0

它不適用於'astype'? – jezrael

+0

'df.zipcode = df.zipcode.astype('category')'的作品。沒有錯誤消息和dtypes顯示正確。但sklearn linear_model賦予與int類型時相同的權重。嘗試輸入str'df.zipcode = df.zipcode.astype(str)' - >'試圖在DataFrame的一個片段的副本上設置一個值。 嘗試使用.loc [row_indexer,col_indexer] = value instead'然後zipcode顯示爲類型對象,並且linear_model將其賦予與以前相同的權重。試圖在graphlab中重新創建任務,其中的zipcode爲str,並給出了更低的訓練錯誤。再次感謝! – jklaus

相關問題