2016-08-15 29 views
1
age  income student  credit_rating Class_buys_computer 
0 youth high no fair no 
1 youth high no excellent no 
2 middle_aged  high no fair yes 
3 senior medium no fair yes 
4 senior low  yes  fair yes 
5 senior low  yes  excellent no 
6 middle_aged  low  yes  excellent yes 
7 youth medium no fair no 
8 youth low  yes  fair yes 
9 senior medium yes  fair yes 
10 youth medium yes  excellent yes 
11 middle_aged  medium no excellent yes 
12 middle_aged  high yes  fair yes 
13 senior medium no excellent no 

我使用這個數據集,並希望有變量,例如ageincome等像在Rfactor variables,我怎麼能做到這一點在Python如何在蟒蛇catagorical因子變量

+0

我需要一個解決方案python(熊貓) –

+0

R內置了對因素的支持。雖然熊貓有分類dtype,但很多圖書館都要求您使用虛擬字符。您可能需要使用熊貓的get_dummies或scikit-learn的OneHotEncoder。 – ayhan

回答

1

您可以使用astype與參數category

cols = ['age','income','student'] 

for col in cols: 
    df[col] = df[col].astype('category') 

print (df.dtypes) 
age     category 
income     category 
student    category 
credit_rating   object 
Class_buys_computer  object 
dtype: object 

如果需要轉換的所有列:

for col in df.columns: 
    df[col] = df[col].astype('category') 

print (df.dtypes) 
age     category 
income     category 
student    category 
credit_rating   category 
Class_buys_computer category 
dtype: object 

你需要循環,因爲如果使用:

df = df.astype('category') 

NotImplementedError: > 1 ndim Categorical are not supported at this time

Pandas documentation about categorical

編輯的評論:

如果需要訂購catagorical,使用帶有pandas.Categorical另一種解決方案:

df['age']=pd.Categorical(df['age'],categories=["youth","middle_aged","senior"],ordered=True) 

print (df.age) 
0   youth 
1   youth 
2  middle_aged 
3   senior 
4   senior 
5   senior 
6  middle_aged 
7   youth 
8   youth 
9   senior 
10   youth 
11 middle_aged 
12 middle_aged 
13   senior 
Name: age, dtype: category 
Categories (3, object): [youth < middle_aged < senior] 

然後你就可以age列進行排序數據框:

df = df.sort_values('age') 
print (df) 
      age income student credit_rating Class_buys_computer 
0   youth high  no   fair     no 
1   youth high  no  excellent     no 
7   youth medium  no   fair     no 
8   youth  low  yes   fair     yes 
10  youth medium  yes  excellent     yes 
2 middle_aged high  no   fair     yes 
6 middle_aged  low  yes  excellent     yes 
11 middle_aged medium  no  excellent     yes 
12 middle_aged high  yes   fair     yes 
3  senior medium  no   fair     yes 
4  senior  low  yes   fair     yes 
5  senior  low  yes  excellent     no 
9  senior medium  yes   fair     yes 
13  senior medium  no  excellent     no 
+0

是否有可能像這樣的青少年

+0

是的,當然,給我一下。 – jezrael