2014-01-27 94 views
2

我有關於客戶和他們訪問的商店(至少一次)的數據。組合的出現頻率(2乘2)

Customer | Store 

1 A 
1 B 
2 A 
2 C 
3 A 
4 A 
4 B 
4 C 

我想知道有多少用戶訪問的2個存儲每個組合。

如何轉換以前的數據結構(用R)以獲得以下結構?

Store 1 | Store 2 | Nb_Customer 
A   B   2  (Customer 1 & 4 visited store A & B) 
A   C   2  (Customer 2 & 4 visited store A & C) 

編輯 關於Henrik的解決方案:正如你可以看到我有對的問題。

# number of visits for each customer in each store 
> df <- data.frame(Customer=c(1,1,2,2,3,4,4,4), Store=c('A', 'B', 'A', 'C', 'A', 'A', 'B', 'C')) 
> # number of visits for each customer in each store 
> tt <- with(df, table(df$Customer, df$Store)) 
> tt 

    A B C 
    1 1 1 0 
    2 1 0 1 
    3 1 0 0 
    4 1 1 1 
> 
> # number of stores 
> n <- with(df, length(unique(df$Store))) 
> n 
[1] 3 
> 
> # all pairs of column numbers, to be selected from the table tt 
> cols <- with(df, combn(n, 2)) 
> cols 
    [,1] [,2] [,3] 
[1,] 1 1 2 
[2,] 2 3 3 
> 
> # pairs of stores 
> pair <- t(with(df, combn(unique(df$Store), 2))) 
> pair 
    [,1] [,2] 
[1,] "A" "B" 
[2,] "1" "3" 
[3,] "2" "3" 

回答

2

另一種可能性:

# number of visits for each customer in each store 
tt <- with(df, table(Customer, Store)) 
tt 

# number of stores 
n <- with(df, length(unique(Store))) 
n 

# all pairs of column numbers, to be selected from the table tt 
cols <- with(df, combn(n, 2)) 
cols 

# pairs of stores 
pair <- t(with(df, combn(unique(Store), 2))) 
pair 

# select pairs of columns from tt 
# count number of rows for which each customer has visited more than one store 
# combine the counts with names of stores from 'pairs' to a data frame 
ll <- lapply(seq(ncol(cols)), function(x){ 
    tt2 <- tt[ , cols[ , x]] 
    n_cust <- sum(rowSums(tt2) > 1) 
    data.frame(store1 = pair[x, 1], store2 = pair[x, 2], n_cust = n_cust) 
}) 
ll 

# convert list to data frame 
df2 <- do.call(rbind, ll) 
df2 

# store1 store2 n_cust 
# 1  A  B  2 
# 2  A  C  2 
# 3  B  C  1 
+0

我認爲對有一個錯誤。 錯誤的商店名稱出現在這一步(數字而不是字母) – Ophelie

+0

其實我有字符串中的數字(「2」,「3」)。我用你的解決方案的結果編輯我的帖子 – Ophelie

+0

也許你的'商店'是一個'因素'。檢查'str(你的數據框的名稱)'。它可以很好地與'Store'一起作爲'角色'。 – Henrik

1

也許這是不是這樣做的最有效的方式,但它的工作原理:

df <- data.frame(Customer=c(1,1,2,2,3,4,4,4), Store=c('A', 'B', 'A', 'C', 'A', 'A', 'B', 'C')) 

cmb <- t(combn(unique(as.character(df$Store)),m=2)) 
count <- rep(0,nrow(cmb)) 

for (i in unique(df$Customer)){ 
    for (j in 1:nrow(cmb)){ 
    count[j] <- count[j]+as.numeric(all(cmb[j,] %in% df$Store[df$Customer==i])) 
    } 
} 

res <- data.frame(Store1=cmb[,1], Store2=cmb[,2], Nb_customer=count) 

     Store1 Store2 Nb_customer 
1  A  B   2 
2  A  C   2 
3  B  C   1 

編輯:

並採用關聯規則,你可以做這樣的:

# load library arulas 
library(arules) 
#original data frame 
df <- data.frame(Customer=c(1,1,2,2,3,4,4,4), Store=c('A', 'B', 'A', 'C', 'A', 'A', 'B', 'C')) 

# create list 
a_list <- lapply(unique(df$Customer),function(x)df$Store[df$Customer==x]) 

## set transaction names 
names(a_list) <- paste("Tr",unique(df$Customer), sep = "") 
a_list 

## coerce into transactions 
trans <- as(a_list, "transactions") 

# create association rules 
rules <- apriori(trans, parameter=list(minlen=2, maxlen=2, ext=TRUE, originalSupport=FALSE)) 
# calculate frequency of pairs of stores 
[email protected]$abs_support <- [email protected]$support*length(trans) 
inspect(rules) 


    lhs rhs support confidence lhs.support lift abs_support 
1 {B} => {A}  0.5   1   0.5 1   2 
2 {C} => {A}  0.5   1   0.5 1   2 

abs_support是麻木的呃共同發生

+0

我有數百個商店和成千上萬的客戶,它需要太多的時間來使用循環。 我該如何矢量化這個解決方案? – Ophelie

+0

比使用關聯規則,這是非常快速:http://www.rdatamining.com/examples/association-rules – Zbynek

+0

我不明白這和關聯規則之間的聯繫 – Ophelie

0

這樣的事情?

d<-data.frame(v1=c(1,1,2,2,3,4,4,4),v2=c("A","B","A","C","A","A","B","C")) 
df<-as.data.frame.matrix(table(d)) 
which(df$A==1 & df$B==1) 
which(df$A==1 & df$C==1) 
which(df$B==1 & df$C==1) 
+0

我有更多的商店和組合。我不能像你一樣列出所有的人。 – Ophelie

+1

您是否想要考慮客戶至少一次或一次訪問商店的情況? – DatamineR

+0

至少有一次,我要編輯說明 – Ophelie