2016-07-14 38 views
2

我的數據是這樣的發現在每列

df <- structure(list(A = c(0.91971, 0.61566, 0.78723, 1.038, 0.65656, 
0.9448, NaN, 1.1353, 0.82117, 0.15673), RA = c(NaN, 10, NaN, 
200, NaN, 0.2, NaN, NaN, 30, NaN), B = c(100, 0.2, NaN, 400, 
NaN, NaN, 20, NaN, 3, NaN), CM = c(NaN, NaN, 77, NaN, 2, NaN, 
0.02, NaN, 0.8, 1), D = c(6, 5, NaN, NaN, NaN, 0.1, 0.5, NaN, 
NaN, NaN)), .Names = c("A", "RA", "B", "CM", "D"), row.names = c(NA, 
-10L), class = "data.frame") 


#   A RA  B CM D 
#1 0.91971 NaN 100.0 NaN 6.0 
#2 0.61566 10.0 0.2 NaN 5.0 
#3 0.78723 NaN NaN 77.00 NaN 
#4 1.03800 200.0 400.0 NaN NaN 
#5 0.65656 NaN NaN 2.00 NaN 
#6 0.94480 0.2 NaN NaN 0.1 
#7  NaN NaN 20.0 0.02 0.5 
#8 1.13530 NaN NaN NaN NaN 
#9 0.82117 30.0 3.0 0.80 NaN 
#10 0.15673 NaN NaN 1.00 NaN 

我想知道有多少元素的不同範圍內的值的個數在於:

  • [1, 5)之間[0, 1)之間
  • [5, 10)
  • and >= 10

所以輸出應該類似於例如第一列

  ColumnA columnRA columnB columnCM columnD 
0 to 1 7   1   1   2   2 
1 to 5 2   0   1   2   0 
5 to 10 0   0   0   0   2 
above 10 0   3   3   1   0 

我試圖用sapply但我無法弄清楚如何做到這一點

count0-1 <-sapply(x, function(x) sum(length(which(x >0 & <1)))) 
+0

'count0_1 < - sapply(DF,函數(x)的總和(長度(其(X> 0&X <1))))' – Sumedh

+0

請提open'和'close'的'的條件。 – user2100721

回答

4

你可以一次過做到這一點:

df2 <- sapply(df, function(x) { 
    t(table(cut(x, 
       breaks = c(0,1,5,10,Inf), 
       right = F))) 
    }) 

rownames(df2) <- c("0 to 1", "1 to 5", "between 5 and 10", "above 10") 
colnames(df2) <- paste0("Column",colnames(df2)) 


       ColumnA ColumnRA ColumnB ColumnCM ColumnD 
0 to 1     7  1  1  2  2 
1 to 5     2  0  1  2  0 
between 5 and 10  0  0  0  0  2 
above 10    0  3  3  1  0 

更新

正如意見提出由@ m0h3n,使用apply會好得多:

apply(df,2,function(x) table(
cut(x, 
    breaks = c(0,1,5,10,Inf), 
    right = F, 
    labels = c("0 to 1", "1 to 5", "between 5 and 10", "and above 10")))) 

(除去rownames每@ user2100721的評論)

+1

我可以想象使用'apply'會更快。 (df,2,function(x)table(cut(x,breaks = c(0,1,5,10,Inf),right = F)))'。首先它是按列而不是按元素進行的。其次,你不需要轉置。 – 989

+0

你已經有了鍋。如果已經發布了相同的想法,我將不會發布答案。 +1 btw – 989

+1

更新了答案,以添加您的建議 – Sumedh

1

也許這有一定的幫助

apply(apply(df,2,cut,breaks = c(0,1,5,10,Inf),right = F),2,table) 
+0

謝謝,我喜歡你回答,因爲它很方便! – nik

2

另一種去的方法:

rng <- c(0, 1, 5, 10, Inf) 
t(sapply(seq(head(rng,-1)), function(i) colSums(df>=rng[i] & df<rng[i+1], na.rm = T))) 

    # A RA B CM D 
# [1,] 7 1 1 2 2 
# [2,] 2 0 1 2 0 
# [3,] 0 0 0 0 2 
# [4,] 0 3 3 1 0 

這是一個簡短的基準測試。可以看出,這個解決方案是最快的。

library(microbenchmark) 

df <- structure(list(A = c(0.91971, 0.61566, 0.78723, 1.038, 0.65656, 
0.9448, NaN, 1.1353, 0.82117, 0.15673), RA = c(NaN, 10, NaN, 
200, NaN, 0.2, NaN, NaN, 30, NaN), B = c(100, 0.2, NaN, 400, 
NaN, NaN, 20, NaN, 3, NaN), CM = c(NaN, NaN, 77, NaN, 2, NaN, 
0.02, NaN, 0.8, 1), D = c(6, 5, NaN, NaN, NaN, 0.1, 0.5, NaN, 
NaN, NaN)), .Names = c("A", "RA", "B", "CM", "D"), row.names = c(NA, 
-10L), class = "data.frame") 

f_Sumedh <- function(df){as.matrix(sapply(df, function(x) {t(table(cut(x, breaks = c(0,1,5,10,Inf), right = F)))}))} 
f_m0h3n1 <- function(df){as.matrix(apply(df,2,function(x) table(cut(x, breaks = c(0,1,5,10,Inf), right = F, labels = c("0 to 1", "1 to 5", "between 5 and 10", "and above 10")))))} 
f_m0h3n2 <- function(df){rng <- c(0, 1, 5, 10, Inf);t(sapply(seq(head(rng,-1)), function(i) colSums(df>=rng[i] & df<rng[i+1], na.rm = T)))} 

r <- f_Sumedh(df) 
all(r==f_m0h3n1(df)) 
# [1] TRUE 
all(r==f_m0h3n2(df)) 
# [1] TRUE 

microbenchmark(f_Sumedh(df), f_m0h3n1(df), f_m0h3n2(df)) 


# Unit: microseconds 
     # expr  min  lq  mean median  uq  max neval 
# f_Sumedh(df) 715.719 768.7520 826.6880 799.0985 837.859 1709.371 100 
# f_m0h3n1(df) 482.855 512.9015 565.0632 531.8310 578.554 1460.582 100 
# f_m0h3n2(df) 371.680 412.2440 460.9897 432.9770 473.240 1190.761 100 
+0

謝謝,我也喜歡你的解決方案 – nik