2011-09-13 75 views
24

計算無​​向圖中三角形數量的有效算法是什麼(其中圖形是一組頂點和邊)?我一直在搜索谷歌,連續三天閱讀我的書架,每天讀數小時。計算圖中三角形數量的有效算法是什麼?

這是一個家庭作業的作業,我需要這樣一個算法,但是開發它並不包括任務上的任何內容。預計我們可以簡單地從外部資源中找到這樣一種算法,但我已經走到了盡頭。

爲了澄清,圖中的三角形是長度爲三的循環。訣竅是它需要在最多有10,000個節點的頂​​點集上工作。

我目前在C#中工作,但更關心解決此問題的一般方法,而不是複製和粘貼代碼。

在較高水平,我嘗試迄今包括:

  • 廣度,可以跟蹤長度爲三的所有獨特的循環優先搜索。這對我來說似乎是個好主意,但我無法實現它的功能
  • 在圖中的所有節點上循環查看三個頂點是否共享邊。這對於較大的數據集來說運行時間太慢了。爲O(n^3)。

該算法本身是計算聚類係數的一部分。

+0

檢查這對於C程序執行力度[三角形的在給定的圖形數目(http://www.msccomputerscience.com/2014/04/wap-to-find-number-of-triangle-in- given.html) – ARJUN

回答

12

這個問題和矩陣乘法一樣困難。請參閱reference

你知道關於圖表的任何事嗎?它們稀疏嗎?如果沒有,我認爲你不會比O(n^3)做得更好。

+0

它們確實是稀疏圖形,但出於測試目的,我正在密集圖上運行分析。 – XBigTK13X

0

取決於您的圖表如何表示。

如果有一個鄰接矩陣A,三角形的數量應該是tr(A^3)/ 6,換句話說,是對角線元素總和的1/6(​​這個分割負責定位和旋轉)。

如果您有鄰接列表,只需從每個節點開始並執行深度3搜索。計算你到達該節點的頻率 - >再次除以6。

+0

在無向圖中,您只需要進行深度二搜索,然後檢查是否有深度二節點與深度一一匹配。 – han

+0

所以我正在做一個鄰接表實現,你的回答對我有幫助,但是你能解釋爲什麼你除以6嗎?我明白這是爲了消除冗餘,但你能向我解釋你是如何到達6的?在我看來,每個三角形將被計數3次(每個可能的源頂點一次),所以爲什麼不除以3? –

+0

@JarrydGoodman A^3的每個對角線值是從一個節點到它自身的長度爲3的路徑的數量。因此,你在一個節點上計算每個三角形兩次(因爲你有兩個方向),並且在所有三個節點上計數六次。 – Yfiua

0

如果您不關心三角形的確切數量,那麼有一個非常簡單的流式算法可以提供無偏估計。有關說明,請參閱here

3

您將需要深度優先搜索。該算法將是:

1)對於當前節點詢問所有未訪問的相鄰節點

2)爲每個節點上運行深度兩個檢查,看是否在深度2的節點是您的當前節點從步驟一個

3)標記當前節點作爲化妝訪問

4),每個未訪問的鄰接節點的當前節點(1 1)和運行相同的算法

+0

我有着完全相同的算法,但是當你說「對於當前節點請求所有未訪問的相鄰節點」時,你究竟是什麼意思,因爲一個節點由x和y座標組成。在考慮其他節點作爲相鄰節點時,我們是否應該根據當前節點的x和y座標來決定?例如(1,2),(2,3),(3,1),(3,2),(1,3)哪些節點與哪個節點相鄰? – Harshdeep

+0

我認爲同樣的事情,但以不同的方式應用來自任何節點的dfs(使用堆棧),並且保持父節點的軌跡ie(從節點出現的位置)也將節點標記爲遍歷並將節點的邊緣從所有節點移除那麼如果在dfs期間發現遍歷的節點,則檢查該節點的父節點(即遍歷的節點),並且如果父節點的父節點相同則意味着存在三角形,則從堆棧中彈出該節點。 –

0

如這裏引用:https://math.stackexchange.com/a/117030

找到並計算三角形的最快算法依賴於快速矩陣乘積並具有O(nω)時間複雜度,其中ω< 2.376是快速矩陣乘積指數。然而,這種方法導致θ(n2)空間複雜度。

1

三角形計數確實很困難,而且計算成本很高。也許這是瞭解原因的一個很好的起點:Efficient Triangle Counting in Large Graphs via Degree-based Vertex Partitioning。 (n *(n-1)),並且繼續週期以查看n的鄰居的鄰居的鄰居是否是n:(n *(n-1)),其中n = 1, )(n-1)(n-1),對於10000 n幾乎爲10^16。有一百萬個節點,這些循環變得愚蠢,但是對於你的10000個人來說,如果你想暴力破解,你應該沒有任何問題:)

你提到過你用C#編寫代碼,並且它可以用於C )有一個很好的計算Gabor Csardi寫的三角形的算法。它在我的10000個節點的隨機圖中計算了130萬個三角形,在一臺5年的筆記本計算機上,它在1.3秒內計算了100萬個邊緣:) Gabor Csardi會是個問問的人:)

就不同的編程方法而言,應該查看您存儲網絡的數據。如果存儲在鄰接矩陣中,則循環的數量是固定的,但是在三個邊的網絡的邊緣列表中,循環的數量是三的倍數,與節點的數量無關。你可以問一個節點的鄰居的邊界列表,而不必測試每一個i-> j的組合。

我在R中編寫了一個教學腳本來說明方法,並測量不同算法的速度,這是一個非常基本的方法。在這裏使用R有很多固有的速度問題(邊緣列表版本完全被太多邊緣所淹沒),但是我認爲代碼示例應該能夠得到一些關於如何思考速度的想法,迫使三角形計數。這是在R,並不是非常整潔,但很好評論。我希望你能打破語言障礙。

所有最優秀的。

# Counting triangles in a random graph using igraph and two different 
# and almost equally stupid approaches looping through the 1) adjacency 
# matrix and 2) the edge-list in R. 

# Use igraph and these configs 
library(igraph) 
V <- 100 
E <- 1700 

# This is the random graph that we will use 
g <- erdos.renyi.game(type="gnm", n=V, p=E, directed=FALSE, loops=FALSE) 

# igraph has such a fast algorythm. Long live Gabor Csardi! 
benchmark <- proc.time()["elapsed"] 
     triangle.count <- sum(count_triangles(g)/3) 
gabor.Csardi.benchmark <- proc.time()["elapsed"] - benchmark 

# For not to big networks we can plot them to get a basic feel 
if(length(V(g)) < 100){ 
     V(g)$size <- 5 
     plot(g) 
} 

# The adjacency matrix approach will have to deal with a crazy 
# ammount of looping through pairs of matrix combinations to 
# find links: 

# We'll loop through each node to check it's participation in triangles 
# notice that a triangle ijk will be participated in by nodes 
# i, j, and k, and that each of those nodes have two triangular counts. 
# All in all the structures ijk, ikj, jik, jki, kij, kji are each counted 
# but shall be returned as 1 triangle. We will therefore devide our 
# search-result by 6 later on. 

# Store our progess in this matrix to look at how we did later on 
progress <- matrix(0, nrow=length(V(g)), ncol=8) 

# Loop through all nodes to find triangles in an adjacency matrix 
benchmark <- proc.time()["elapsed"] # Measure time for this loop 
for(i in 1:length(V(g))){ 
     # Node i has connections to these nodes: 
     i.neighbors <- as.vector(neighborhood(g, 1, nodes=i)[[1]]) 
     i.neighbors <- setdiff(i.neighbors, c(i)) # i should not be part of its own neighborhood 

     # for each i, tri is the number of triangles that i is involved in 
     # using any j or any k. For a triangle {1,2,3}, tri will be 2 for 
     # i==1, since i is part of both triangles {1,2,3} and {1,3,2}: 
     tri <- 0 

     for(j in i.neighbors) 
     { 
       # Node j has connections to these nodes: 
       j.neighbors <- as.vector(neighborhood(g, 1, nodes=j)[[1]]) 
       j.neighbors <- setdiff(j.neighbors, c(j)) # j should not be part of its own neighborhood 

       # Were any of j's neighbors also a neighbor of i? 
       k <- intersect(i.neighbors, j.neighbors) 

       tri <- tri + length(k) 
     } 

     # Save our findings to the progress matrix 
     progress[i,1] <- tri 
     progress[i,7] <- proc.time()["elapsed"] - benchmark 
} 
progress[,2] <- sapply(1:length(progress[,1]), function(x) sum(progress[,1][1:x])) 
progress[,3] <- round(progress[,2]/6, digits=2) 

# The edge-list approach uses a list of all edges in the network to loop through instead 
# Here, I suppose, a lot of the extra speed could arise from R being better at looping 
# with lapply() and at finding data in a data.frame that the brute-force loop above is. 
el <- as.data.frame(as.matrix(get.edgelist(g,))) 

# This is ugly. Make the edgelist contain all edges as both i->j and j->i. In 
# the igraph object, they are only given as low i to high j by get.edgelist() 
    el.rev <- data.frame(el[,2], el[,1]) 
    names(el) <- names(el.rev) <- c("i","j") 
    el <- rbind(el, el.rev) 

# these nodes are connected (we'd only need to bother abouth non isolates) 
nodes <- sort(unique(c(el$i, el$j))) 
tri <- 0 

# Loop through connected nodes to find triangles in edge-list 
benchmark <- proc.time()["elapsed"] # Measure time for this loop 
for(i in nodes){ 
     i.neighbors <- el[el$i==i,]$j 
     # i's neighbors are the $j:s of the edgelist where $i:s are i. 

     k.list <- unlist(lapply(i.neighbors, function(x) intersect(i.neighbors,el[el$i==x, ]$j))) 
     # lists nodes that can be a k in an ijk-triangle for each of i's neighboring j:s 
     # If 1 has neighbors 2 and 3, el[el$i==x, ]$j) will be first, the neighbors of 2 and then 
     # the neighbors of 3. When intersected with the neighbors of i, k:s will be found. If 
     # {1,2,3} is a triangle one k will be 3 for {i=1, j=2}, and another k will be 2 for {i=1, j=3} 

     # k.list might be NULL 
     tri.for.this.i <- (as.numeric(length(k.list))/2) 
     # Here we devide by two since i can be in a triangle with j and k lik {ijk} and {ikj} 
     # We will later have to devide by 3 more, since each triangle will be counted for 
     # each node i that we loop through 

     # Save the counting to the progress 
     tri <- tri.for.this.i + tri 
     progress[i,4] <- as.numeric(tri.for.this.i) 
     mm <- c(mm, i) 
     progress[i,8] <- proc.time()["elapsed"] - benchmark 
} 
progress[,5] <- sapply(1:length(progress[,4]), function(x) sum(progress[,4][1:x])) 
progress[,6] <- round(progress[,5]/3, digits=2) 

# Fix the results into a usable format 
results <- data.frame(c("igraph", "adjacency-loop", "edge-loop"), 
         c(triangle.count, max(progress[,3]), max(progress[,6])), 
         c(gabor.Csardi.benchmark, (max(progress[,7]) - min(progress[,7])), (max(progress[,8]) - min(progress[,8])))) 
row.names(results) <- c("igraph", "Adjacensy-loop", "Edge-loop") 
names(results) <- c("Routine", "Triangle count", "Execution-time") 

# Now we have run two approaches of more or less the same thing. 
# Not only should the igraph triangle.count, and the two loops 
# be identical, but the progress of the two methods should too. 
progress[,3] == progress[,6] 
plot(progress[,6], type="l", col="blue") 
lines(progress[,7], col="green") 

# Look at the result: 
View(results)