2017-06-18 32 views
0

rstudio 3.4.0 32位(64位操作系統)窗10errror而具有低細胞計數的值分組爲水平

分析和運行kaggle內核鈦,得到沒有錯誤,沒有結果。從乘客的名字

str(full) 
'data.frame': 1309 obs. of 13 variables: 
$ PassengerId: int 1 2 3 4 5 6 7 8 9 10 ... 
$ Survived : int 0 1 1 1 0 0 0 0 1 1 ... 
$ Pclass  : int 3 1 3 1 3 3 1 3 3 2 ... 
$ Name  : chr "Braund, Mr. Owen Harris" "Cumings, Mrs. John Bradley 
(Florence Briggs Thayer)" "Heikkinen, Miss. Laina" "Futrelle, Mrs. Jacques Heath (Lily May Peel)" ... 
$ Sex  : chr "male" "female" "female" "female" ... 
$ Age  : num 22 38 26 35 35 NA 54 2 27 14 ... 
$ SibSp  : int 1 1 0 1 0 0 0 3 0 1 ... 
$ Parch  : int 0 0 0 0 0 0 0 1 2 0 ... 
$ Ticket  : chr "A/5 21171" "PC 17599" "STON/O2. 3101282" "113803" ... 
$ Fare  : num 7.25 71.28 7.92 53.1 8.05 ... 
$ Cabin  : chr "" "C85" "" "C123" ... 
$ Embarked : chr "S" "C" "S" "S" ... 
$ Title  : chr " Mr" " Mrs" " Miss" " Mrs" ... 

抓取標題:

full$Title <- gsub('(.*,)|(\\..*)','',full$Name) 

# Show title counts by sex 
table(full$Sex, full$Title) 

# Titles with very low cell counts to be combined to "rare" level 
rare_title <- c ('Dona', 'Lady', 'the Countess','Capt', 'Col', 'Don', 
       'Dr', 'Major', 'Rev', 'Sir', 'Jonkheer') 

# Also reassign mlle, ms, and mme accordingly 
full$Title[full$Title == 'Mlle']  <- 'Miss' 
full$Title[full$Title == 'Ms']   <- 'Miss' 
full$Title[full$Title == 'Mme']   <- 'Mrs' 
full$Title[full$Title %in% rare_title] <- 'Rare Title' 

# Show title counts by sex again 
table(full$Sex, full$Title) 

     Capt Col Don Dona Dr Jonkheer Lady Major Master Miss Mlle 
    female  0 0 0  1 1   0  1  0  0 260  2 
    male  1 4 1  0 7   1  0  2  61  0  0 

      Mme Mr Mrs Ms Rev Sir the Countess 
    female 1 0 197 2 0 0    1 
    male  0 757 0 0 8 1    0 

我無法理解爲什麼值不歸爲罕見的水平,雖然我沒有錯誤。那麼爲什麼會發生?

+0

你能提供'STR(全)' –

+0

我已經添加STR(全)@表示P鋰皁石。請看看 –

回答

1

問題是您的標題前有空白。正如你在str(full)中看到的那樣,標題是這樣的" Mr"而不是這個"Mr"

你可以用trimws修復:

full <- data.frame(Title=c(" Mr", " Mrs", " Miss", " Major"," Don"), 
        age=1:5,stringsAsFactors = FALSE) 
rare_title <- c ('Dona', 'Lady', 'the Countess','Capt', 'Col', 'Don' 
       ,'Dr', 'Major', 'Rev', 'Sir', 'Jonkheer') 
full$Title[trimws(full$Title) %in% rare_title] <- 'Rare Title' 

[1] " Mr"  " Mrs"  " Miss"  "Rare Title" "Rare Title" 
+0

我只增加了一個「先生」的空間,它的工作。我從來沒有觀察過這麼小的事情..非常感謝 –