計算圖中三角形數量的有效算法是什麼？

計算無向圖中三角形數量的有效算法是什麼（其中圖形是一組頂點和邊）？我一直在搜索谷歌，連續三天閱讀我的書架，每天讀數小時。計算圖中三角形數量的有效算法是什麼？

這是一個家庭作業的作業，我需要這樣一個算法，但是開發它並不包括任務上的任何內容。預計我們可以簡單地從外部資源中找到這樣一種算法，但我已經走到了盡頭。

爲了澄清，圖中的三角形是長度爲三的循環。訣竅是它需要在最多有10,000個節點的頂點集上工作。

我目前在C＃中工作，但更關心解決此問題的一般方法，而不是複製和粘貼代碼。

在較高水平，我嘗試迄今包括：

廣度，可以跟蹤長度爲三的所有獨特的循環優先搜索。這對我來說似乎是個好主意，但我無法實現它的功能
在圖中的所有節點上循環查看三個頂點是否共享邊。這對於較大的數據集來說運行時間太慢了。爲O（n^3）。

該算法本身是計算聚類係數的一部分。

來源

2011-09-13 XBigTK13X

檢查這對於C程序執行力度[三角形的在給定的圖形數目（http://www.msccomputerscience.com/2014/04/wap-to-find-number-of-triangle-in- given.html） – ARJUN

這個問題和矩陣乘法一樣困難。請參閱reference。

你知道關於圖表的任何事嗎？它們稀疏嗎？如果沒有，我認爲你不會比O（n^3）做得更好。

來源

2011-09-13 04:02:39

它們確實是稀疏圖形，但出於測試目的，我正在密集圖上運行分析。 – XBigTK13X

取決於您的圖表如何表示。

如果有一個鄰接矩陣A，三角形的數量應該是tr（A^3）/ 6，換句話說，是對角線元素總和的1/6（這個分割負責定位和旋轉）。

如果您有鄰接列表，只需從每個節點開始並執行深度3搜索。計算你到達該節點的頻率 - >再次除以6。

來源

2011-09-16 16:06:57

在無向圖中，您只需要進行深度二搜索，然後檢查是否有深度二節點與深度一一匹配。 – han

所以我正在做一個鄰接表實現，你的回答對我有幫助，但是你能解釋爲什麼你除以6嗎？我明白這是爲了消除冗餘，但你能向我解釋你是如何到達6的？在我看來，每個三角形將被計數3次（每個可能的源頂點一次），所以爲什麼不除以3？ –

@JarrydGoodman A^3的每個對角線值是從一個節點到它自身的長度爲3的路徑的數量。因此，你在一個節點上計算每個三角形兩次（因爲你有兩個方向），並且在所有三個節點上計數六次。 – Yfiua

如果您不關心三角形的確切數量，那麼有一個非常簡單的流式算法可以提供無偏估計。有關說明，請參閱here。

來源

2013-12-06 10:23:48 adrianN

您將需要深度優先搜索。該算法將是：

1）對於當前節點詢問所有未訪問的相鄰節點

2）爲每個節點上運行深度兩個檢查，看是否在深度2的節點是您的當前節點從步驟一個

3）標記當前節點作爲化妝訪問

4），每個未訪問的鄰接節點的當前節點（1 1）和運行相同的算法

來源

2013-12-06 10:59:25 evhen14

我有着完全相同的算法，但是當你說「對於當前節點請求所有未訪問的相鄰節點」時，你究竟是什麼意思，因爲一個節點由x和y座標組成。在考慮其他節點作爲相鄰節點時，我們是否應該根據當前節點的x和y座標來決定？例如（1,2），（2,3），（3,1），（3,2），（1,3）哪些節點與哪個節點相鄰？ – Harshdeep

我認爲同樣的事情，但以不同的方式應用來自任何節點的dfs（使用堆棧），並且保持父節點的軌跡ie（從節點出現的位置）也將節點標記爲遍歷並將節點的邊緣從所有節點移除那麼如果在dfs期間發現遍歷的節點，則檢查該節點的父節點（即遍歷的節點），並且如果父節點的父節點相同則意味着存在三角形，則從堆棧中彈出該節點。 –

如這裏引用：https://math.stackexchange.com/a/117030

找到並計算三角形的最快算法依賴於快速矩陣乘積並具有O（nω）時間複雜度，其中ω< 2.376是快速矩陣乘積指數。然而，這種方法導致θ（n2）空間複雜度。

來源

2015-09-10 12:16:07

三角形計數確實很困難，而且計算成本很高。也許這是瞭解原因的一個很好的起點：Efficient Triangle Counting in Large Graphs via Degree-based Vertex Partitioning。（n *（n-1）），並且繼續週期以查看n的鄰居的鄰居的鄰居是否是n：（n *（n-1）），其中n = 1，）（n-1）（n-1），對於10000 n幾乎爲10^16。有一百萬個節點，這些循環變得愚蠢，但是對於你的10000個人來說，如果你想暴力破解，你應該沒有任何問題:)

你提到過你用C＃編寫代碼，並且它可以用於C ）有一個很好的計算Gabor Csardi寫的三角形的算法。它在我的10000個節點的隨機圖中計算了130萬個三角形，在一臺5年的筆記本計算機上，它在1.3秒內計算了100萬個邊緣:) Gabor Csardi會是個問問的人:)

就不同的編程方法而言，應該查看您存儲網絡的數據。如果存儲在鄰接矩陣中，則循環的數量是固定的，但是在三個邊的網絡的邊緣列表中，循環的數量是三的倍數，與節點的數量無關。你可以問一個節點的鄰居的邊界列表，而不必測試每一個i-> j的組合。

我在R中編寫了一個教學腳本來說明方法，並測量不同算法的速度，這是一個非常基本的方法。在這裏使用R有很多固有的速度問題（邊緣列表版本完全被太多邊緣所淹沒），但是我認爲代碼示例應該能夠得到一些關於如何思考速度的想法，迫使三角形計數。這是在R，並不是非常整潔，但很好評論。我希望你能打破語言障礙。

所有最優秀的。

# Counting triangles in a random graph using igraph and two different 
# and almost equally stupid approaches looping through the 1) adjacency 
# matrix and 2) the edge-list in R. 

# Use igraph and these configs 
library(igraph) 
V <- 100 
E <- 1700 

# This is the random graph that we will use 
g <- erdos.renyi.game(type="gnm", n=V, p=E, directed=FALSE, loops=FALSE) 

# igraph has such a fast algorythm. Long live Gabor Csardi! 
benchmark <- proc.time()["elapsed"] 
     triangle.count <- sum(count_triangles(g)/3) 
gabor.Csardi.benchmark <- proc.time()["elapsed"] - benchmark 

# For not to big networks we can plot them to get a basic feel 
if(length(V(g)) < 100){ 
     V(g)$size <- 5 
     plot(g) 
} 

# The adjacency matrix approach will have to deal with a crazy 
# ammount of looping through pairs of matrix combinations to 
# find links: 

# We'll loop through each node to check it's participation in triangles 
# notice that a triangle ijk will be participated in by nodes 
# i, j, and k, and that each of those nodes have two triangular counts. 
# All in all the structures ijk, ikj, jik, jki, kij, kji are each counted 
# but shall be returned as 1 triangle. We will therefore devide our 
# search-result by 6 later on. 

# Store our progess in this matrix to look at how we did later on 
progress <- matrix(0, nrow=length(V(g)), ncol=8) 

# Loop through all nodes to find triangles in an adjacency matrix 
benchmark <- proc.time()["elapsed"] # Measure time for this loop 
for(i in 1:length(V(g))){ 
     # Node i has connections to these nodes: 
     i.neighbors <- as.vector(neighborhood(g, 1, nodes=i)[[1]]) 
     i.neighbors <- setdiff(i.neighbors, c(i)) # i should not be part of its own neighborhood 

     # for each i, tri is the number of triangles that i is involved in 
     # using any j or any k. For a triangle {1,2,3}, tri will be 2 for 
     # i==1, since i is part of both triangles {1,2,3} and {1,3,2}: 
     tri <- 0 

     for(j in i.neighbors) 
     { 
       # Node j has connections to these nodes: 
       j.neighbors <- as.vector(neighborhood(g, 1, nodes=j)[[1]]) 
       j.neighbors <- setdiff(j.neighbors, c(j)) # j should not be part of its own neighborhood 

       # Were any of j's neighbors also a neighbor of i? 
       k <- intersect(i.neighbors, j.neighbors) 

       tri <- tri + length(k) 
     } 

     # Save our findings to the progress matrix 
     progress[i,1] <- tri 
     progress[i,7] <- proc.time()["elapsed"] - benchmark 
} 
progress[,2] <- sapply(1:length(progress[,1]), function(x) sum(progress[,1][1:x])) 
progress[,3] <- round(progress[,2]/6, digits=2) 

# The edge-list approach uses a list of all edges in the network to loop through instead 
# Here, I suppose, a lot of the extra speed could arise from R being better at looping 
# with lapply() and at finding data in a data.frame that the brute-force loop above is. 
el <- as.data.frame(as.matrix(get.edgelist(g,))) 

# This is ugly. Make the edgelist contain all edges as both i->j and j->i. In 
# the igraph object, they are only given as low i to high j by get.edgelist() 
    el.rev <- data.frame(el[,2], el[,1]) 
    names(el) <- names(el.rev) <- c("i","j") 
    el <- rbind(el, el.rev) 

# these nodes are connected (we'd only need to bother abouth non isolates) 
nodes <- sort(unique(c(el$i, el$j))) 
tri <- 0 

# Loop through connected nodes to find triangles in edge-list 
benchmark <- proc.time()["elapsed"] # Measure time for this loop 
for(i in nodes){ 
     i.neighbors <- el[el$i==i,]$j 
     # i's neighbors are the $j:s of the edgelist where $i:s are i. 

     k.list <- unlist(lapply(i.neighbors, function(x) intersect(i.neighbors,el[el$i==x, ]$j))) 
     # lists nodes that can be a k in an ijk-triangle for each of i's neighboring j:s 
     # If 1 has neighbors 2 and 3, el[el$i==x, ]$j) will be first, the neighbors of 2 and then 
     # the neighbors of 3. When intersected with the neighbors of i, k:s will be found. If 
     # {1,2,3} is a triangle one k will be 3 for {i=1, j=2}, and another k will be 2 for {i=1, j=3} 

     # k.list might be NULL 
     tri.for.this.i <- (as.numeric(length(k.list))/2) 
     # Here we devide by two since i can be in a triangle with j and k lik {ijk} and {ikj} 
     # We will later have to devide by 3 more, since each triangle will be counted for 
     # each node i that we loop through 

     # Save the counting to the progress 
     tri <- tri.for.this.i + tri 
     progress[i,4] <- as.numeric(tri.for.this.i) 
     mm <- c(mm, i) 
     progress[i,8] <- proc.time()["elapsed"] - benchmark 
} 
progress[,5] <- sapply(1:length(progress[,4]), function(x) sum(progress[,4][1:x])) 
progress[,6] <- round(progress[,5]/3, digits=2) 

# Fix the results into a usable format 
results <- data.frame(c("igraph", "adjacency-loop", "edge-loop"), 
         c(triangle.count, max(progress[,3]), max(progress[,6])), 
         c(gabor.Csardi.benchmark, (max(progress[,7]) - min(progress[,7])), (max(progress[,8]) - min(progress[,8])))) 
row.names(results) <- c("igraph", "Adjacensy-loop", "Edge-loop") 
names(results) <- c("Routine", "Triangle count", "Execution-time") 

# Now we have run two approaches of more or less the same thing. 
# Not only should the igraph triangle.count, and the two loops 
# be identical, but the progress of the two methods should too. 
progress[,3] == progress[,6] 
plot(progress[,6], type="l", col="blue") 
lines(progress[,7], col="green") 

# Look at the result: 
View(results)

來源

2016-05-18 10:50:46 nJGL

計算圖中三角形數量的有效算法是什麼？

回答

相關問題