組合的出現頻率（2乘2）

我有關於客戶和他們訪問的商店（至少一次）的數據。組合的出現頻率（2乘2）

Customer | Store 

1 A 
1 B 
2 A 
2 C 
3 A 
4 A 
4 B 
4 C

我想知道有多少用戶訪問的2個存儲每個組合。

如何轉換以前的數據結構（用R）以獲得以下結構？

Store 1 | Store 2 | Nb_Customer A B 2 (Customer 1 & 4 visited store A & B) A C 2 (Customer 2 & 4 visited store A & C)

編輯關於Henrik的解決方案：正如你可以看到我有對的問題。

# number of visits for each customer in each store > df <- data.frame(Customer=c(1,1,2,2,3,4,4,4), Store=c('A', 'B', 'A', 'C', 'A', 'A', 'B', 'C')) > # number of visits for each customer in each store > tt <- with(df, table(df$Customer, df$Store)) > tt A B C 1 1 1 0 2 1 0 1 3 1 0 0 4 1 1 1 > > # number of stores > n <- with(df, length(unique(df$Store))) > n [1] 3 > > # all pairs of column numbers, to be selected from the table tt > cols <- with(df, combn(n, 2)) > cols [,1] [,2] [,3] [1,] 1 1 2 [2,] 2 3 3 > > # pairs of stores > pair <- t(with(df, combn(unique(df$Store), 2))) > pair [,1] [,2] [1,] "A" "B" [2,] "1" "3" [3,] "2" "3"

來源

2014-01-27 Ophelie

另一種可能性：

# number of visits for each customer in each store 
tt <- with(df, table(Customer, Store)) 
tt 

# number of stores 
n <- with(df, length(unique(Store))) 
n 

# all pairs of column numbers, to be selected from the table tt 
cols <- with(df, combn(n, 2)) 
cols 

# pairs of stores 
pair <- t(with(df, combn(unique(Store), 2))) 
pair 

# select pairs of columns from tt 
# count number of rows for which each customer has visited more than one store 
# combine the counts with names of stores from 'pairs' to a data frame 
ll <- lapply(seq(ncol(cols)), function(x){ 
    tt2 <- tt[ , cols[ , x]] 
    n_cust <- sum(rowSums(tt2) > 1) 
    data.frame(store1 = pair[x, 1], store2 = pair[x, 2], n_cust = n_cust) 
}) 
ll 

# convert list to data frame 
df2 <- do.call(rbind, ll) 
df2 

# store1 store2 n_cust 
# 1  A  B  2 
# 2  A  C  2 
# 3  B  C  1

來源

2014-01-27 13:54:07 Henrik

我認爲對有一個錯誤。錯誤的商店名稱出現在這一步（數字而不是字母） – Ophelie

其實我有字符串中的數字（「2」，「3」）。我用你的解決方案的結果編輯我的帖子 – Ophelie

也許你的'商店'是一個'因素'。檢查'str（你的數據框的名稱）'。它可以很好地與'Store'一起作爲'角色'。 – Henrik

也許這是不是這樣做的最有效的方式，但它的工作原理：

df <- data.frame(Customer=c(1,1,2,2,3,4,4,4), Store=c('A', 'B', 'A', 'C', 'A', 'A', 'B', 'C')) 

cmb <- t(combn(unique(as.character(df$Store)),m=2)) 
count <- rep(0,nrow(cmb)) 

for (i in unique(df$Customer)){ 
    for (j in 1:nrow(cmb)){ 
    count[j] <- count[j]+as.numeric(all(cmb[j,] %in% df$Store[df$Customer==i])) 
    } 
} 

res <- data.frame(Store1=cmb[,1], Store2=cmb[,2], Nb_customer=count) 

     Store1 Store2 Nb_customer 
1  A  B   2 
2  A  C   2 
3  B  C   1

編輯：

並採用關聯規則，你可以做這樣的：

# load library arulas 
library(arules) 
#original data frame 
df <- data.frame(Customer=c(1,1,2,2,3,4,4,4), Store=c('A', 'B', 'A', 'C', 'A', 'A', 'B', 'C')) 

# create list 
a_list <- lapply(unique(df$Customer),function(x)df$Store[df$Customer==x]) 

## set transaction names 
names(a_list) <- paste("Tr",unique(df$Customer), sep = "") 
a_list 

## coerce into transactions 
trans <- as(a_list, "transactions") 

# create association rules 
rules <- apriori(trans, parameter=list(minlen=2, maxlen=2, ext=TRUE, originalSupport=FALSE)) 
# calculate frequency of pairs of stores 
[email protected]$abs_support <- [email protected]$support*length(trans) 
inspect(rules) 


    lhs rhs support confidence lhs.support lift abs_support 
1 {B} => {A}  0.5   1   0.5 1   2 
2 {C} => {A}  0.5   1   0.5 1   2

abs_support是麻木的呃共同發生

來源

2014-01-27 12:05:49 Zbynek

我有數百個商店和成千上萬的客戶，它需要太多的時間來使用循環。我該如何矢量化這個解決方案？ – Ophelie

比使用關聯規則，這是非常快速：http://www.rdatamining.com/examples/association-rules – Zbynek

我不明白這和關聯規則之間的聯繫 – Ophelie

這樣的事情？

d<-data.frame(v1=c(1,1,2,2,3,4,4,4),v2=c("A","B","A","C","A","A","B","C")) 
df<-as.data.frame.matrix(table(d)) 
which(df$A==1 & df$B==1) 
which(df$A==1 & df$C==1) 
which(df$B==1 & df$C==1)

來源

2014-01-27 12:07:19 DatamineR

我有更多的商店和組合。我不能像你一樣列出所有的人。 – Ophelie

您是否想要考慮客戶至少一次或一次訪問商店的情況？ – DatamineR

至少有一次，我要編輯說明 – Ophelie

組合的出現頻率（2乘2）

回答

相關問題