2013-06-18 50 views
3

我有一個數據文件,它具有分類數據作爲列。如如何找到原始數據的最短路徑

node_id,second_major,gender,major_index,year,dorm,high_school,student_fac 
0,0,2,257,2007,111,2849,1 
1,0,2,271,2005,0,51195,2 
2,0,2,269,2007,0,21462,1 
3,269,1,245,2008,111,2597,1 
.......................... 

此數據爲列。我將它轉換爲edgelist和nodelist。 喜歡的EdgeList都:

0 4191 
0 949 
1 3002 
1 4028 
1 957 
2 2494 
2 959 
2 3011 
3 4243 
4 965 
5 1478 
........ 
........ 

到底有什麼工作要做,以找到節點之間的最短路徑。沒有權重邊緣。我應該如何在python中實現這個代碼?

+1

刪除'matplotlib'標籤,因爲我沒有看到這是如何做與繪圖有關。如果你不同意可以放回去。 – tacaswell

回答

2

這是一個經典的廣度優先搜索問題,你有一個無向圖,你想找到兩個頂點之間的最短路徑。

廣度優先搜索一些有用的鏈接:

,你必須注意的一些邊緣情況:

  • 源和des之間沒有路徑tination頂點
  • 源和目標是相同的頂點

我會假設你的邊緣名單列表的字典或列表的列表,如。

[[4191, 949], [3002, 4028, 957], [2494, 959, 3011], [4243, 965], [1478], ...] 

或者

{ 0: [4191, 949], 
    1: [3002, 4028, 957], 
    2: [2494, 959, 3011], 
    3: [4243, 965], 
    4: [1478], ...} 

我已經寫了一些代碼來顯示廣度優先搜索是如何工作的:

import sys 
import sys 
import Queue 

def get_shortest_path(par, src, dest): 
    ''' 
    Returns the shortest path as a list of integers 
    par - parent information 
    src - source vertex 
    dest - destination vertex 
    ''' 
    if dest == src: 
     return [src] 
    else: 
     ret = get_shortest_path(par, src, par[dest]) 
     ret.append(dest) 
     return ret 

def bfs(edgeList, src, dest): 
    ''' 
    Breadth first search routine. Returns (distance, shortestPath) pair from src to dest. Returns (-1, []) if there is no path from src to dest 
    edgeList - adjacency list of graph. Either list of lists or dict of lists 
    src - source vertex 
    dest - destination vertex 
    ''' 
    vis = set() # stores the vertices that have been visited 
    par = {} # stores parent information. vertex -> parent vertex in BFS tree 
    distDict = {} # stores distance of visited vertices from the source. This is the number of edges between the source vertex and the given vertex 
    q = Queue.Queue() 
    q.put((src, 0)) # enqueue (source, distance) pair 
    par[src] = -1 # source has no parent 
    vis.add(src) # minor technicality, will explain later 
    while not q.empty(): 
     (v,dist) = q.get() # grab vertex in queue 
     distDict[v] = dist # update the distance 
     if v == dest: 
      break # reached destination, done 
     nextDist = dist+1 
     for nextV in edgeList[v]: 
      # visit vertices adjacent to the current vertex 
      if nextV not in vis: 
       # not yet visited 
       par[nextV] = v # update parent of nextV to v 
       q.put((nextV, nextDist)) # add into queeu 
       vis.add(nextV) # mark as visited 
    # obtained shortest path now 
    if dest in distDict: 
     return (distDict[dest], get_shortest_path(par, src, dest)) 
    else: 
     return (-1, []) # no shortest path 

# example run, feel free to remove this 
if __name__ == '__main__': 
    edgeList = { 
     0: [6,], 
     1: [2, 7], 
     2: [1, 3, 6], 
     3: [2, 4, 5], 
     4: [3, 8], 
     5: [3, 7], 
     6: [0, 2], 
     7: [1, 5], 
     8: [4], 
    } 
    while True: 
     src = int(sys.stdin.readline()) 
     dest = int(sys.stdin.readline()) 
     (dist, shortest_path) = bfs(edgeList, src, dest) 
     print 'dist =', dist 
     print 'shortest_path =', shortest_path 
+1

當然沒問題。對我來說,使用Python編寫bfs也是一種很好的做法=) – yanhan