檢查數據幀的指定列的所有組合是否經常出現

您有數據框。如何最好地檢查特定列的值的所有組合是否經常出現？檢查數據幀的指定列的所有組合是否經常出現

（在處理數據文件時，有時需要使用因子設計來處理數據文件。每列是一個獨立變量，我們希望檢查所有獨立變量的組合都經常出現）。

2013-06-19 Alex Holcombe

replications()怎麼樣？

tmp <- transform(ToothGrowth, dose = factor(dose)) 

replications(~ supp + dose, data = tmp) 
replications(~ supp * dose, data = tmp) 

> replications(~ supp + dose, data = tmp) 
supp dose 
    30 20 
> replications(~ supp * dose, data = tmp) 
    supp  dose supp:dose 
     30  20  10

而且從?replications我們有一個平衡測試：

!is.list(replications(~ supp + dose, data = tmp)) 

> !is.list(replications(~ supp + dose, data = tmp)) 
[1] TRUE

從replications()輸出是不太可能會發生什麼，但使用所示的測試給你想要的答案。

來源

2013-06-19 22:35:23

checkAllCombosOccurEquallyOften<- function(df,colNames,dropZeros=FALSE) { 
    #in data.frame df, check whether the factors in the list colNames reflect full factorial design (all combinations of levels occur equally often) 
    # 
    #dropZeros is useful if one of the factors nested in the others. E.g. testing different speeds for each level of  
    # something else, then a lot of the combos will occur 0 times because that speed not exist for that level. 
    #but it's dangerous to dropZeros because it won't pick up on 0's that occur for the wrong reason- not fully crossed 
    # 
    #Returns: 
    # true/false, and prints informational message 
    # 
    listOfCols <- as.list(df[colNames]) 
    t<- table(listOfCols) 

    if (dropZeros) { 
     t<- t[t!=0] 
    }   
    colNamesStr <- paste(colNames,collapse=",") 
    if (length(unique(t)) == 1) { #if fully crossed, all entries in table should be identical (all combinations occur equally often) 
      print(paste(colNamesStr,"fully crossed- each combination occurred",unique(t)[1],'times')) 
      ans <- TRUE 
     } else { 
      print(paste(colNamesStr,"NOT fully crossed,",length(unique(t)),'distinct repetition numbers.' )) 
      ans <- FALSE 
     } 
    return(ans) 
}

加載的數據集，並調用上面的函數

library(datasets) 
checkAllCombosOccurEquallyOften(ToothGrowth,c("supp","dose")) #specify dataframe and columns

輸出提供了答案，它的完全越過：

[1] "supp,dose fully crossed- each combination occurred 10 times" 
[1] TRUE

來源

2013-06-19 07:56:14

我不會稱之爲「組合」，而是「組」或類似的東西。 –

使用相同的ToothGrowth數據：

library(datasets) 
library(data.table) 

dt = data.table(ToothGrowth) 

setkey(dt, supp, dose) 
dt[CJ(unique(supp), unique(dose)), .N] # note: using hidden by-without-by 
# supp dose N 
#1: OJ 0.5 10 
#2: OJ 1.0 10 
#3: OJ 2.0 10 
#4: VC 0.5 10 
#5: VC 1.0 10 
#6: VC 2.0 10

然後你可以檢查所有的N是否平等，或者你喜歡什麼。

來源

2013-06-19 15:18:04 eddi

檢查數據幀的指定列的所有組合是否經常出現

回答

相關問題