0
rstudio 3.4.0 32位(64位操作系統)窗10errror而具有低細胞計數的值分組爲水平
分析和運行kaggle內核鈦,得到沒有錯誤,沒有結果。從乘客的名字
str(full)
'data.frame': 1309 obs. of 13 variables:
$ PassengerId: int 1 2 3 4 5 6 7 8 9 10 ...
$ Survived : int 0 1 1 1 0 0 0 0 1 1 ...
$ Pclass : int 3 1 3 1 3 3 1 3 3 2 ...
$ Name : chr "Braund, Mr. Owen Harris" "Cumings, Mrs. John Bradley
(Florence Briggs Thayer)" "Heikkinen, Miss. Laina" "Futrelle, Mrs. Jacques Heath (Lily May Peel)" ...
$ Sex : chr "male" "female" "female" "female" ...
$ Age : num 22 38 26 35 35 NA 54 2 27 14 ...
$ SibSp : int 1 1 0 1 0 0 0 3 0 1 ...
$ Parch : int 0 0 0 0 0 0 0 1 2 0 ...
$ Ticket : chr "A/5 21171" "PC 17599" "STON/O2. 3101282" "113803" ...
$ Fare : num 7.25 71.28 7.92 53.1 8.05 ...
$ Cabin : chr "" "C85" "" "C123" ...
$ Embarked : chr "S" "C" "S" "S" ...
$ Title : chr " Mr" " Mrs" " Miss" " Mrs" ...
抓取標題:
full$Title <- gsub('(.*,)|(\\..*)','',full$Name)
# Show title counts by sex
table(full$Sex, full$Title)
# Titles with very low cell counts to be combined to "rare" level
rare_title <- c ('Dona', 'Lady', 'the Countess','Capt', 'Col', 'Don',
'Dr', 'Major', 'Rev', 'Sir', 'Jonkheer')
# Also reassign mlle, ms, and mme accordingly
full$Title[full$Title == 'Mlle'] <- 'Miss'
full$Title[full$Title == 'Ms'] <- 'Miss'
full$Title[full$Title == 'Mme'] <- 'Mrs'
full$Title[full$Title %in% rare_title] <- 'Rare Title'
# Show title counts by sex again
table(full$Sex, full$Title)
Capt Col Don Dona Dr Jonkheer Lady Major Master Miss Mlle
female 0 0 0 1 1 0 1 0 0 260 2
male 1 4 1 0 7 1 0 2 61 0 0
Mme Mr Mrs Ms Rev Sir the Countess
female 1 0 197 2 0 0 1
male 0 757 0 0 8 1 0
我無法理解爲什麼值不歸爲罕見的水平,雖然我沒有錯誤。那麼爲什麼會發生?
你能提供'STR(全)' –
我已經添加STR(全)@表示P鋰皁石。請看看 –