2017-06-01 57 views
0

我有這樣的數據集:如何從數據框中刪除異常值?

dput(head(data,20)) 
structure(list(Date = structure(c(1495722600, 1495723500, 1495724400, 
1495725300, 1495726200, 1495727100, 1495728000, 1495728900, 1495729800, 
1495730700, 1495731600, 1495732500, 1495733400, 1495734300, 1495735200, 
1495736100, 1495737000, 1495737900, 1495738800, 1495739700), class = c("POSIXct", 
"POSIXt"), tzone = ""), JVM_CPU = c(1.07500004768372, 1.75, 10.6979999542236, 
2.40000009536743, 2.42400002479553, 5.80000019073486, 6.80000019073486, 
1.85000002384186, 8.52499961853027, 0.800000011920929, 12.7740001678467, 
0.174999997019768, 0.499000012874603, 0.248999997973442, 6.82499980926514, 
1.125, 0.949000000953674, 0.874000012874603, 6.55000019073486, 
0.248999997973442)), .Names = c("Date", "JVM_CPU"), row.names = c(NA, 
-20L), class = c("tbl_df", "tbl", "data.frame")) 

我需要集的子集這使得它不具有異常值:

我能做到這一點,從這個去掉異常值:數據$ JVM_CPU:

data_cpu$JVM_CPU[!data_cpu$JVM_CPU %in% boxplot.stats(data_cpu$JVM_CPU)$out] 

但我需要從此數據幀數據中刪除異常值。任何想法,我怎麼能做到這一點?

回答

1

您可以首先確定要保留在df中的哪些行(即不是異常值),然後使用邏輯向量來對df進行子集。

keep <- !data_cpu$JVM_CPU %in% boxplot.stats(data_cpu$JVM_CPU)$out 
data_cpu[keep, ] 
1

使用它來索引行並刪除這些行。

data_cpu[-which(data_cpu$JVM_CPU %in% boxplot.stats(data_cpu$JVM_CPU)$out), ] 

或者,您的示例在希望行保留的位置返回TRUE FALSE,因此您可以使用它。

data_cpu[!data_cpu$JVM_CPU %in% boxplot.stats(data_cpu$JVM_CPU)$out, ]