聯合發生R中的變量

我想統計個體並結合變量的出現（1表示存在，0表示不存在）。這可以通過table函數的多次使用來獲得（參見下面的MWE）。如果有人給我更有效的方法來獲得下面給出的要求輸出，將不勝感激。由於聯合發生R中的變量

set.seed(12345) 
A <- rbinom(n = 100, size = 1, prob = 0.5) 
B <- rbinom(n = 100, size = 1, prob = 0.6) 
C <- rbinom(n = 100, size = 1, prob = 0.7) 
df <- data.frame(A, B, C) 

table(A) 
A 
0 1 
48 52 

table(B) 
B 
0 1 
53 47 

table(C) 
C 
0 1 
34 66 

table(A, B) 
    B 
A 0 1 
    0 25 23 
    1 28 24 

table(A, C) 
    C 
A 0 1 
    0 12 36 
    1 22 30 

table(B, C) 
    C 
B 0 1 
    0 21 32 
    1 13 34 

table(A, B, C) 
, , C = 0 

    B 
A 0 1 
    0 8 4 
    1 13 9 

, , C = 1 

    B 
A 0 1 
    0 17 19 
    1 15 15

所需的輸出

我需要像下面這樣：

A = 52 
B = 45 
C = 66 
A + B = 24 
A + C = 30 
B + C = 34 
A + B + C = 15

來源

2016-06-28 MYaseen208

準確地說，輸出應該如何結構化？對於上面的許多人來說，也是'crossprod（as.matrix（df））' –

所以你不想把'A'與'AB'分開計算嗎？ – TARehman

是的，你正確@TARehman – MYaseen208

擴展在Sumedh的回答，你也可以做到這一點動態，而不必每次都指定過濾器。如果您有多於三列的組合，這將非常有用。

你可以做這樣的事情：

lapply(seq_len(ncol(df)), function(i){ 
    # Generate all the combinations of i element on all columns 
    tmp_i = utils::combn(names(df), i) 
    # In the columns of tmp_i we have the elements in the combination 
    apply(tmp_i, 2, function(x){ 
    dynamic_formula = as.formula(paste("~", paste(x, "== 1", collapse = " & "))) 
    df %>% 
     filter_(.dots = dynamic_formula) %>% 
     summarize(Count = n()) %>% 
     mutate(type = paste0(sort(x), collapse = "")) 
    }) %>% 
    bind_rows() 
}) %>% 
    bind_rows()

這將：

1）產生DF的列的所有組合。先用一個元件組合（A，B，C），然後用兩個元件（AB，AC，BC）等。這是外部lapply

2）然後對於每個組合將創建一個動態式的那些。對於AB，例如公式將是A == 1 & B == 1，正如Sumedh所建議的那樣。這是dynamic_formula位。

3）將過濾與所述動態地生成的式數據幀和計數行數

4）綁定所有在一起（這兩個bind_rows）

的輸出將是

Count type 
1 52 A 
2 47 B 
3 66 C 
4 24 AB 
5 30 AC 
6 34 BC 
7 15 ABC

來源

2016-06-28 14:33:41

感謝@洛倫佐的有用答案。如果你解釋的話，將不勝感激_如果你有不止3列的組合，這將是有用的._ – MYaseen208

我的意思是，如果你想要結合使用3列的數據框，你可以使用完全相同的解決方案： A，B，C它可以像4,5,6列一樣工作，所以如果你也加上D < - rbinom（n = 100，size = 1，prob = 0.5），E < - rbinom（n = 100，size = 1，prob = 0.6）等等，它仍然可以正常工作並計算所有的組合 –

使用dplyr，
發生只有A的：

library(dplyr) 
df %>% filter(A == 1) %>% summarise(Total = nrow(.))

發生A和B：

df %>% filter(A == 1, B == 1) %>% summarise(Total = nrow(.))

Occurence A，B的，和C

df %>% filter(A == 1, B == 1, C == 1) %>% summarise(Total = nrow(.))

來源

2016-06-28 14:18:04 Sumedh

編輯添加：我現在看到你不想獲得排他性計數（即A和AB都應該包含所有的As）。

今天我得到了一點點nerd-sniped，特別是因爲我想用無R包的R來解決它。下面應該這樣做。

有一個非常簡單的（原則上）解決方案，簡單地使用xtabs()，我已經在下面說明了。然而，爲了將其推廣到任何可能的維數，然後將其應用於各種組合，實際上更困難。我努力避免使用可怕的eval(parse())。

set.seed(12345) 
A <- rbinom(n = 100, size = 1, prob = 0.5) 
B <- rbinom(n = 100, size = 1, prob = 0.6) 
C <- rbinom(n = 100, size = 1, prob = 0.7) 
df <- data.frame(A, B, C) 

# Turn strings off 
options(stringsAsFactors = FALSE) 

# Obtain the n-way frequency table 
# This table can be directly subset using [] 
# It is a little tricky to pass the arguments 
# I'm trying to avoid eval(parse()) 
# But still give a solution that isn't bound to a specific size 
xtab_freq <- xtabs(formula = formula(x = paste("~",paste(names(df),collapse = " + "))), 
        data = df) 

# Demonstrating what I mean 
# All A 
sum(xtab_freq["1",,]) 
# [1] 52 

# AC 
sum(xtab_freq["1",,"1"]) 
# [1] 30 

# Using lapply(), we pass names(df) to combn() with m values of 1, 2, and 3 
# The output of combn() goes through list(), then is unlisted with recursive FALSE 
# This gives us a list of vectors 
# Each one being a combination in which we are interested 
lst_combs <- unlist(lapply(X = 1:3,FUN = combn,x = names(df),list),recursive = FALSE) 

# For nice output naming, I just paste the values together 
names(lst_combs) <- sapply(X = lst_combs,FUN = paste,collapse = "") 

# This is a function I put together 
# Generalizes process of extracting values from a crosstab 
# It does it in this fashion to avoid eval(parse()) 
uFunc_GetMargins <- function(crosstab,varvector,success) { 

    # Obtain the dimname-names (the names within each dimension) 
    # From that, get the regular dimnames 
    xtab_dnn <- dimnames(crosstab) 
    xtab_dn <- names(xtab_dnn) 

    # Use match() to get a numeric vector for the margins 
    # This can be used in margin.table() 
    tgt_margins <- match(x = varvector,table = xtab_dn) 

    # Obtain a margin table 
    marginal <- margin.table(x = crosstab,margin = tgt_margins) 

    # To extract the value, figure out which marginal cell contains 
    # all variables of interest set to success 
    # sapply() goes over all the elements of the dimname names 
    # Finds numeric index in that dimension where the name == success 
    # We subset the resulting vector by tgt_margins 
    # (to only get the cells in our marginal table) 
    # Then, use prod() to multiply them together and get the location 
    tgt_cell <- prod(sapply(X = xtab_dnn, 
          FUN = match, 
          x = success)[tgt_margins]) 

    # Return as named list for ease of stacking 
    return(list(count = marginal[tgt_cell])) 
} 

# Doing a call of mapply() lets us get the results 
do.call(what = rbind.data.frame, 
     args = mapply(FUN = uFunc_GetMargins, 
         varvector = lst_combs, 
         MoreArgs = list(crosstab = xtab_freq, 
             success = "1"), 
         SIMPLIFY = FALSE, 
         USE.NAMES = TRUE)) 
#  count 
# A  52 
# B  47 
# C  66 
# AB  24 
# AC  30 
# BC  34 
# ABC 15

我放棄了以前使用aggregate的解決方案。

來源

2016-06-28 14:47:17 TARehman

聯合發生R中的變量

回答

相關問題