創建共生矩陣

我正試圖解決共生矩陣的問題。我有一個交易和項目的數據文件，我想看看項目出現在一起的交易數量矩陣。創建共生矩陣

我是R編程的新手，我很快樂的發現了R的所有快捷方式，而不是創建特定的循環（我曾經使用C年前，現在只堅持使用Excel宏和SPSS ）。我在這裏檢查瞭解決方案，但沒有找到可行的解決方案（最接近的解決方案是：Co-occurrence matrix using SAC? - 但是當我使用projected_tm時，它產生了一條錯誤消息，我懷疑cbind在我的情況下不成功。

基本上我有一個包含表如下：

TrxID Items Quant 
Trx1 A 3 
Trx1 B 1 
Trx1 C 1 
Trx2 E 3 
Trx2 B 1 
Trx3 B 1 
Trx3 C 4 
Trx4 D 1 
Trx4 E 1 
Trx4 A 1 
Trx5 F 5 
Trx5 B 3 
Trx5 C 2 
Trx5 D 1, etc.

我想創造的東西，如：

A B C D E F 
A 0 1 1 0 1 1 
B 1 0 3 1 1 0 
C 1 3 0 1 0 0 
D 1 1 1 0 1 1 
E 1 1 0 1 0 0 
F 0 1 1 1 0 0

我所做的是什麼（你會在我的新秀 - [R方法很可能笑）：

library(igraph) 
library(tnet) 

trx <- read.table("FileName.txt", header=TRUE) 
transID <- t(trx[1]) 
items <- t(trx[2]) 

id_item <- cbind(items,transID) 
item_item <- projecting_tm(id_item, method="sum") 
item_item <- tnet_igraph(item_item,type="weighted one-mode tnet") 
item_matrix <-get.adjacency(item_item,attr="weight") 
item_matrix

如上所述，cbind可能不成功，所以projection_tm無法給我任何結果。

任何替代方法或更正我的方法？

您的幫助將不勝感激！

來源

2012-11-08 jacatra

相關主題[here]（http://stackoverflow.com/questions/14332233/using-graph-adjacency-in-r）。 – hhh

我現在正在處理類似的交易數據，我只想感謝@jacatra（並且hhh發佈的相關帖子也非常有用） – EconomiCurtis

您想要創建的示例中存在一個小錯誤 - B行和F列表示0.它應該說1.困惑了我一段時間。 – vagabond

我會用reshape2包和矩陣代數的組合：

#read in your data 
dat <- read.table(text="TrxID Items Quant 
Trx1 A 3 
Trx1 B 1 
Trx1 C 1 
Trx2 E 3 
Trx2 B 1 
Trx3 B 1 
Trx3 C 4 
Trx4 D 1 
Trx4 E 1 
Trx4 A 1 
Trx5 F 5 
Trx5 B 3 
Trx5 C 2 
Trx5 D 1", header=T) 

#making the boolean matrix 
library(reshape2) 
dat2 <- melt(dat) 
w <- dcast(dat2, Items~TrxID) 
x <- as.matrix(w[,-1]) 
x[is.na(x)] <- 0 
x <- apply(x, 2, function(x) as.numeric(x > 0)) #recode as 0/1 
v <- x %*% t(x)         #the magic matrix 
diag(v) <- 0          #repalce diagonal 
dimnames(v) <- list(w[, 1], w[,1])    #name the dimensions 
v

對於圖形也許......

g <- graph.adjacency(v, weighted=TRUE, mode ='undirected') 
g <- simplify(g) 
# set labels and degrees of vertices 
V(g)$label <- V(g)$name 
V(g)$degree <- degree(g) 
plot(g)

來源

2012-11-08 02:36:19

這其實是很容易的，乾淨，如果你創建一個首先是兩部圖，其中頂部節點是事務並且底部節點是項目。然後你創建一個投影到底層節點。

dat <- read.table(text="TrxID Items Quant 
Trx1 A 3 
Trx1 B 1 
Trx1 C 1 
Trx2 E 3 
Trx2 B 1 
Trx3 B 1 
Trx3 C 4 
Trx4 D 1 
Trx4 E 1 
Trx4 A 1 
Trx5 F 5 
Trx5 B 3 
Trx5 C 2 
Trx5 D 1", header=T) 

library(igraph) 
bip <- graph.data.frame(dat) 
V(bip)$type <- V(bip)$name %in% dat[,1] 

## sparse=TRUE is a good idea if you have a large matrix here 
v <- get.adjacency(bipartite.projection(bip)[[2]], attr="weight", sparse=FALSE) 

## Need to reorder if you want it alphabetically 
v[order(rownames(v)), order(colnames(v))] 

# A B C D E F 
# A 0 1 1 1 1 0 
# B 1 0 3 1 1 1 
# C 1 3 0 1 0 1 
# D 1 1 1 0 1 1 
# E 1 1 0 1 0 0 
# F 0 1 1 1 0 0

來源

2012-11-09 03:05:14

使用「逸」從任一問題的答案上面，儘量crossprod和table：

V <- crossprod(table(dat[1:2])) 
diag(V) <- 0 
V 
#  Items 
# Items A B C D E F 
#  A 0 1 1 1 1 0 
#  B 1 0 3 1 1 1 
#  C 1 3 0 1 0 1 
#  D 1 1 1 0 1 1 
#  E 1 1 0 1 0 0 
#  F 0 1 1 1 0 0

來源

2014-03-26 12:23:44 A5C1D2H2I1M1N2O1R2T1

爲了提高效率，特別是在稀疏的數據，我會建議使用稀疏矩陣。

dat <- read.table(text="TrxID Items Quant 
Trx1 A 3 
Trx1 B 1 
Trx1 C 1 
Trx2 E 3 
Trx2 B 1 
Trx3 B 1 
Trx3 C 4 
Trx4 D 1 
Trx4 E 1 
Trx4 A 1 
Trx5 F 5 
Trx5 B 3 
Trx5 C 2 
Trx5 D 1", header=T) 

library("Matrix") 

# factors for indexing matrix entries and naming dimensions 
trx.fac <- factor(dat[,1]) 
itm.fac <- factor(dat[,2]) 

s <- sparseMatrix(
     as.numeric(trx.fac), 
     as.numeric(itm.fac), 
     dimnames = list(
       as.character(levels(trx.fac)), 
       as.character(levels(itm.fac))), 
     x = 1) 

# calculating co-occurrences 
v <- t(s) %*% s 

# setting transactions counts of items to zero 
diag(v) <- 0 
v

我給每個解決方案發布在這個線程的嘗試。他們都沒有使用大型矩陣（我正在使用1,500 x 2,000,000矩陣）。

有點偏題：在計算一個共生矩陣後，我通常要計算單個項目之間的距離。餘弦相似性/距離可有效地對共生矩陣來計算這樣的：

# cross-product of vectors (numerator) 
num <- v %*% v 

# square root of square sum of each vector (used for denominator) 
srss <- sqrt(apply(v^2, 1, sum)) 

# denominator 
den <- srss %*% t(srss) 

# cosine similarity 
v.cos.sim <- num/den 

# cosine distance 
v.cos.dist <- 1 - v.cos.sim

來源

2014-07-08 08:41:12

我會用XTABS此：

dat <- read.table(text="TrxID Items Quant 
Trx1 A 3 
Trx1 B 1 
Trx1 C 1 
Trx2 E 3 
Trx2 B 1 
Trx3 B 1 
Trx3 C 4 
Trx4 D 1 
Trx4 E 1 
Trx4 A 1 
Trx5 F 5 
Trx5 B 3 
Trx5 C 2 
Trx5 D 1", header=T) 


term_doc <- xtabs(~ TrxID + Items, data=dat, sparse = TRUE) 
co_occur <- crossprod(term_doc, term_doc) 
diag(co_occur) <- 0 
co_occur

我在sparse = TRUE扔，以表示這個可工作用於非常大的數據集。

來源

2015-04-25 01:38:17

創建共生矩陣

回答

相關問題