三角形計數確實很困難,而且計算成本很高。也許這是瞭解原因的一個很好的起點:Efficient Triangle Counting in Large Graphs via Degree-based Vertex Partitioning。 (n *(n-1)),並且繼續週期以查看n的鄰居的鄰居的鄰居是否是n:(n *(n-1)),其中n = 1, )(n-1)(n-1),對於10000 n幾乎爲10^16。有一百萬個節點,這些循環變得愚蠢,但是對於你的10000個人來說,如果你想暴力破解,你應該沒有任何問題:)
你提到過你用C#編寫代碼,並且它可以用於C )有一個很好的計算Gabor Csardi寫的三角形的算法。它在我的10000個節點的隨機圖中計算了130萬個三角形,在一臺5年的筆記本計算機上,它在1.3秒內計算了100萬個邊緣:) Gabor Csardi會是個問問的人:)
就不同的編程方法而言,應該查看您存儲網絡的數據。如果存儲在鄰接矩陣中,則循環的數量是固定的,但是在三個邊的網絡的邊緣列表中,循環的數量是三的倍數,與節點的數量無關。你可以問一個節點的鄰居的邊界列表,而不必測試每一個i-> j的組合。
我在R中編寫了一個教學腳本來說明方法,並測量不同算法的速度,這是一個非常基本的方法。在這裏使用R有很多固有的速度問題(邊緣列表版本完全被太多邊緣所淹沒),但是我認爲代碼示例應該能夠得到一些關於如何思考速度的想法,迫使三角形計數。這是在R,並不是非常整潔,但很好評論。我希望你能打破語言障礙。
所有最優秀的。
# Counting triangles in a random graph using igraph and two different
# and almost equally stupid approaches looping through the 1) adjacency
# matrix and 2) the edge-list in R.
# Use igraph and these configs
library(igraph)
V <- 100
E <- 1700
# This is the random graph that we will use
g <- erdos.renyi.game(type="gnm", n=V, p=E, directed=FALSE, loops=FALSE)
# igraph has such a fast algorythm. Long live Gabor Csardi!
benchmark <- proc.time()["elapsed"]
triangle.count <- sum(count_triangles(g)/3)
gabor.Csardi.benchmark <- proc.time()["elapsed"] - benchmark
# For not to big networks we can plot them to get a basic feel
if(length(V(g)) < 100){
V(g)$size <- 5
plot(g)
}
# The adjacency matrix approach will have to deal with a crazy
# ammount of looping through pairs of matrix combinations to
# find links:
# We'll loop through each node to check it's participation in triangles
# notice that a triangle ijk will be participated in by nodes
# i, j, and k, and that each of those nodes have two triangular counts.
# All in all the structures ijk, ikj, jik, jki, kij, kji are each counted
# but shall be returned as 1 triangle. We will therefore devide our
# search-result by 6 later on.
# Store our progess in this matrix to look at how we did later on
progress <- matrix(0, nrow=length(V(g)), ncol=8)
# Loop through all nodes to find triangles in an adjacency matrix
benchmark <- proc.time()["elapsed"] # Measure time for this loop
for(i in 1:length(V(g))){
# Node i has connections to these nodes:
i.neighbors <- as.vector(neighborhood(g, 1, nodes=i)[[1]])
i.neighbors <- setdiff(i.neighbors, c(i)) # i should not be part of its own neighborhood
# for each i, tri is the number of triangles that i is involved in
# using any j or any k. For a triangle {1,2,3}, tri will be 2 for
# i==1, since i is part of both triangles {1,2,3} and {1,3,2}:
tri <- 0
for(j in i.neighbors)
{
# Node j has connections to these nodes:
j.neighbors <- as.vector(neighborhood(g, 1, nodes=j)[[1]])
j.neighbors <- setdiff(j.neighbors, c(j)) # j should not be part of its own neighborhood
# Were any of j's neighbors also a neighbor of i?
k <- intersect(i.neighbors, j.neighbors)
tri <- tri + length(k)
}
# Save our findings to the progress matrix
progress[i,1] <- tri
progress[i,7] <- proc.time()["elapsed"] - benchmark
}
progress[,2] <- sapply(1:length(progress[,1]), function(x) sum(progress[,1][1:x]))
progress[,3] <- round(progress[,2]/6, digits=2)
# The edge-list approach uses a list of all edges in the network to loop through instead
# Here, I suppose, a lot of the extra speed could arise from R being better at looping
# with lapply() and at finding data in a data.frame that the brute-force loop above is.
el <- as.data.frame(as.matrix(get.edgelist(g,)))
# This is ugly. Make the edgelist contain all edges as both i->j and j->i. In
# the igraph object, they are only given as low i to high j by get.edgelist()
el.rev <- data.frame(el[,2], el[,1])
names(el) <- names(el.rev) <- c("i","j")
el <- rbind(el, el.rev)
# these nodes are connected (we'd only need to bother abouth non isolates)
nodes <- sort(unique(c(el$i, el$j)))
tri <- 0
# Loop through connected nodes to find triangles in edge-list
benchmark <- proc.time()["elapsed"] # Measure time for this loop
for(i in nodes){
i.neighbors <- el[el$i==i,]$j
# i's neighbors are the $j:s of the edgelist where $i:s are i.
k.list <- unlist(lapply(i.neighbors, function(x) intersect(i.neighbors,el[el$i==x, ]$j)))
# lists nodes that can be a k in an ijk-triangle for each of i's neighboring j:s
# If 1 has neighbors 2 and 3, el[el$i==x, ]$j) will be first, the neighbors of 2 and then
# the neighbors of 3. When intersected with the neighbors of i, k:s will be found. If
# {1,2,3} is a triangle one k will be 3 for {i=1, j=2}, and another k will be 2 for {i=1, j=3}
# k.list might be NULL
tri.for.this.i <- (as.numeric(length(k.list))/2)
# Here we devide by two since i can be in a triangle with j and k lik {ijk} and {ikj}
# We will later have to devide by 3 more, since each triangle will be counted for
# each node i that we loop through
# Save the counting to the progress
tri <- tri.for.this.i + tri
progress[i,4] <- as.numeric(tri.for.this.i)
mm <- c(mm, i)
progress[i,8] <- proc.time()["elapsed"] - benchmark
}
progress[,5] <- sapply(1:length(progress[,4]), function(x) sum(progress[,4][1:x]))
progress[,6] <- round(progress[,5]/3, digits=2)
# Fix the results into a usable format
results <- data.frame(c("igraph", "adjacency-loop", "edge-loop"),
c(triangle.count, max(progress[,3]), max(progress[,6])),
c(gabor.Csardi.benchmark, (max(progress[,7]) - min(progress[,7])), (max(progress[,8]) - min(progress[,8]))))
row.names(results) <- c("igraph", "Adjacensy-loop", "Edge-loop")
names(results) <- c("Routine", "Triangle count", "Execution-time")
# Now we have run two approaches of more or less the same thing.
# Not only should the igraph triangle.count, and the two loops
# be identical, but the progress of the two methods should too.
progress[,3] == progress[,6]
plot(progress[,6], type="l", col="blue")
lines(progress[,7], col="green")
# Look at the result:
View(results)
檢查這對於C程序執行力度[三角形的在給定的圖形數目(http://www.msccomputerscience.com/2014/04/wap-to-find-number-of-triangle-in- given.html) – ARJUN