2015-04-03 61 views
1

我有一個數據框「測試」,我希望子集,但是當我嘗試我失去了所有的觀察。這是爲什麼發生?子集數據丟失我所有的觀察

> str(Test) 
'data.frame': 157025 obs. of 13 variables: 
$ Cancellations : int 1 1 1 1 1 1 1 1 1 1 ... 
$ Benefit   : chr "Single Parent Support       "    "Single Parent Support       " "Job Seeker           " "Job Seeker          " ... 
$ Region   : chr "  Northland " "  Northland " "   Northland " "  Northland " ... 
$ Month   : chr "Jun 14" "Jun 14" "Jun 14" "Jun 14" ... 
$ CanReason  : chr "Change in Marital Status   " "Change in  Marital Status   " "Change in Marital Status   " "Change in  Marital Status   " ... 
$ Age    : chr " 20-24 " " 20-24 " " 20-24 " " 20-24 " ... 
$ Ethnicity  : chr "NZ European/Pakeha" "Maori    " "Other      " "NZ European/Pakeha" ... 
$ SMS    : chr "General Case Management    " "Work  Focused Case Management   " "Work Focused Case Management   " "Work  Search Support     " ... 
$ Duration   : chr "2-4 yrs " "2-4 yrs " "6-9 mth " "0-3 mth " ... 
$ SMSDuration  : int 361 348 59 69 150 37 63 294 107 107 ... 
$ AgeYoungest  : chr "0-4 yrs " "0-4 yrs " "No Children" "No Children" ... 
$ AgeYoungestNonSub: chr "0-4 yrs" "0-4 yrs" "No Children" "No Children" ... 
$ Liability  : chr " 166,000 " " 166,000 " " 102,000 " " 102,000 " ... 


> subDie <- Test[CanReason == "Died",] 

> str(subDie) 
'data.frame': 0 obs. of 13 variables: 
$ Cancellations : int 
$ Benefit   : chr 
$ Region   : chr 
$ Month   : chr 
$ CanReason  : chr 
$ Age    : chr 
$ Ethnicity  : chr 
$ SMS    : chr 
$ Duration   : chr 
$ SMSDuration  : int 
$ AgeYoungest  : chr 
$ AgeYoungestNonSub: chr 
$ Liability  : chr 

我試過將因子變量轉換爲字符。當我將逗號放在「CanReason」索引行(subDie < - Test [,CanReason ==「Died」])前面時,R告訴我對15個變量有157025個觀測值... Im stumped

+0

就是這樣 「死」 或 「死」 +多餘的空格? – 2015-04-03 02:36:27

+0

這可能是由於@Pascal提到的額外空格,但是'dput(head(Test))'比'str'更有用。 – Molx 2015-04-03 02:39:16

+0

我剛剛試過 - 「死」(1多餘的空間) - 「死」(2多餘的空間) - 「死」(3多餘的空間) 沒有這樣的運氣。 – 2015-04-03 02:53:10

回答

1

使用正則表達式在字符向量CanReason中搜索字符串"Died",使用grepl()返回指示是否匹配的邏輯向量。用它來子集Test

例如

set.seed(12) 
CanReason <- sample(c("Change in  Marital status", 
         "Change in Marital status ", 
         " Died ", 
         "Died    ", 
         "Died"), 10000, replace = TRUE) 
ind <- grepl("Died", CanReason) 

sum(ind) 
length(CanReason[ind]) 

,並提供:

> sum(ind) 
[1] 6037 
> length(CanReason[ind]) 
[1] 6037 
> head(CanReason[ind]) 
[1] "Died"     "Died"     "Died    " 
[4] "Died"     " Died "   " Died "