2014-07-08 52 views
0

問題是:用其第一個數字替換4位數字

我有7列,其中超過30%的行是NA。我所有的列都是數字。

關於這些高缺失值列我想根據這些列的分位數的值創建4個新列。

1st column- input 1 in rows which contains data; 0 otherwise 
2nd column- input 1 in rows below the first quantile; 0 otherwise 
3rd column- input 1 in rows that are in the 2nd quantile range; 0 otherwise 
4th column- input 1 in rows that are above the 3rd quantile; 0 otherwise 

我得到了第一列。但其餘的,基於分位數的閾值是一個挑戰。

我的下一個3列上只有3位數基地:33.33333%,66.66667%和100%

quantile(High_NAS_set1$EFX, prob=c(33/99,66/99,99/99),na.rm=TRUE) 

這裏是我迄今爲止...

#1st column: assign 1 for a row that contains data; 0 otherwise 

New.EFX_<-High_NAS_set1$EFX #creating a new column 


New.EFX_Emp_Total[!is.na(New.EFX)]<-1 
New.EFX_Emp_Total[is.na(New.EFX)]<-0 


#2nd Column:assign 1 in rows below the first quantile; 0 otherwise 

New.EFX2_<-High_NAS_set1$EFX #creating a new column 

quant<-quantile(New2.EFX_Emp,probs=33/99,na.rm=TRUE) 

which(New2.EFX_Emp_Total<=quant)<-1 # assign 1 for rows which indexes are below quant 
which(New2.EFX_Emp_Total!=quant)<-0 

最後2行給我一個錯誤:

Error in which(New2.EFX_Emp_Total <= quant) <- 1 : 
    could not find function "which<-" 
+1

首先,我不是一名R程序員,所以我作爲一名PHP開發人員將把####除以1000,然後將其舍入到最接近的整數。 –

+1

這不是使用'which'的方式。 'New2.EFX_Emp_Total [which(New2.EFX_Emp_Total <= quant)] < - 1' –

+0

你*知道33/99是1/3嗎? 99/99是1?保持代碼簡單:-) –

回答

0

一種方法:

qtl <- quantile(High_NAS_set1$EFX, prob=c(1/3, 2/3, 1), na.rm=TRUE) 

High_NAS_set1$EFX033 <- ifelse(High_NAS_set1$EFX <= qtl[1], 1, 0) 
High_NAS_set1$EFX066 <- ifelse(High_NAS_set1$EFX <= qtl[2], 1, 0) 
High_NAS_set1$EFX100 <- ifelse(High_NAS_set1$EFX <= qtl[3], 1, 0) 
相關問題