查找CSV文件中所有可能的路徑

我有一個具有以下格式的CSV文件：查找CSV文件中所有可能的路徑

header1,header2,header3,header4 
1,4,2,5 
1,4,0,5 
0,4,2,5

我的問題的相關信息，只在第1列和第3列。我試圖找到所有在此csv文件的可能的路徑，其中這兩個值被連接（在有向路徑），如果它們是相同的排。例如，在上述數據：

1 is connected to 2 
1 is connected to 0 
0 is connected to 2

然後所有可能的路徑是：

[1,2] 
[1,0,2] 
[0,2]

隨着網絡資源（特別是this）的幫助下，我已經能夠找到的所有路徑，一個指定的起始節點和結束節點。以下是我的代碼：

import csv 
def main(): 
    inputFile = "file_directory" 
    a =[] 
    with open(inputFile) as csvfile: 
     reader = csv.reader(csvfile) 
     next(reader) 
     for line in reader: 
     a.append([line[0], line[2]]) 
    # This will print all the paths starting with 1 and ending with 2 
    print(str(getAllSimplePaths('1', '2', a))) 


def getAllSimplePaths(originNode, targetNode, a): 
     return helpGetAllSimplePaths(targetNode, 
         [originNode], 
         set(originNode), 
         a, 
         list()) 


def helpGetAllSimplePaths(targetNode, currentPath, usedNodes, a, answerPaths): 
    lastNode = currentPath[-1] 
    if lastNode == targetNode: 
    answerPaths.append(list(currentPath)) 
    else: 
    for elem in a: 
     if elem[0] == lastNode: 
     if elem[1] not in usedNodes: 
      currentPath.append(elem[1]) 
      usedNodes.add(elem[1]) 
      helpGetAllSimplePaths(targetNode,currentPath,usedNodes,a,answerPaths) 
      usedNodes.remove(elem[1]) 
      currentPath.pop() 
    return answerPaths    


if __name__ == '__main__': 
    main()

當我運行此，我正確地得到以下結果：

[['1', '2'], ['1', '0', '2']]

不過，我真正想要做的是能遍歷所有元素我的csv文件的第二列，並找到每個元素的所有可能的路徑。我一直在爲此工作數天，我無法想出辦法做到這一點。我的csv文件大約有2000行。任何幫助/建議將不勝感激！謝謝！

更新：額外的信息

CSV文件中的每一行已經是兩個元素之間的路徑。所以，我有路徑的數目將等於我在我的csv文件的行數。現在，從在我的問題的例子中的第一行中，1被連接到2，因此[「1」，「2」]是一個路徑。對於每一行，我想通過查看同一行第三列（elem2）來查找第一列中元素（elem1）的連接情況，然後在第一列中搜索csv文件中的所有行以找到elem2。如果它存在於某行的第一列，則elem2必須連接到同一行第三列（elem3）中的對應元素。在這種情況下，我們的路徑是[elem1，elem2，elem3]。同樣，對於elem3，我將不得不查看所有行以查看它是否存在於第一列中。如果它不存在，那麼我完成了第一條路。接下來我繼續前進到第二條路。

上面的例子中所需的輸出應該是這樣的：

[['1','2'], ['1', '0', '2'], ['0', '2'], ['1','0']]

我使用Python 3.5.1。

來源

2017-01-08 User 1

是否有周期，例如一個可能性有其中'1'被連接到'0'和'0'連接到'1'的情況？如果是，那麼這種情況下期望的輸出是什麼？ – niemmi

是的，有循環的可能性。但是，上面的代碼旨在檢測並且不允許發生週期。例如，如果1連接到2並且2連接到1，則假定1是起始節點並且2是終止節點，程序將僅輸出[['1'，'2']]。 –

首先，我建議你在你的問題中修復代碼的縮進。此外，「每個元素的所有可能路徑」是什麼意思 - 從什麼到什麼？ – martineau

編輯

下面是我優化多一點版本。在將它用於非常大的csv文件之前，我建議您刪除一些/大部分的打印內容 - 這不會影響最終結果。

import csv 
from pprint import pprint, pformat 

def main(): 
    inputFile = "paths.csv" 
    with open(inputFile, newline='') as csvfile: 
     reader = csv.reader(csvfile) 
     next(reader) 
     a = [[row[0], row[2]] for row in reader] 
    print('a:\n', pformat(a)) 

    # construct an adjacency *dictionary* 
    nodeToNodes = {} 
    for src, dst in a: 
     nodeToNodes.setdefault(src, []).append(dst) 
    print('\nnodeToNodes:\n', pformat(nodeToNodes)) 

    print('\ngathering results:') 
    all_paths = [] 
    for src, dst in a: 
     print(' {} <-> {}'.format(src, dst)) 
     more_paths = getAllSimplePaths(dst, [src], {src}, nodeToNodes, []) 
     print(' {}'.format(pformat(more_paths))) 
     all_paths.extend(more_paths) 

    print('\nall paths: {}'.format(pformat(all_paths))) 

def getAllSimplePaths(targetNode, currentPath, usedNodes, nodeToNodes, answerPaths): 
    lastNode = currentPath[-1] 
    if lastNode == targetNode: 
     answerPaths.append(currentPath[:]) 
    elif lastNode in nodeToNodes: 
     for neighbor in nodeToNodes[lastNode]: 
      if neighbor not in usedNodes: 
       currentPath.append(neighbor) 
       usedNodes.add(neighbor) 
       getAllSimplePaths(targetNode, currentPath, usedNodes, nodeToNodes, 
            answerPaths) 
       usedNodes.remove(neighbor) 
       currentPath.pop() 

    return answerPaths 

if __name__ == '__main__': 
    main()

輸出：

a: 
[['1', '2'], ['1', '0'], ['0', '2']] 

nodeToNodes: 
{'0': ['2'], '1': ['2', '0']} 

gathering results: 
    1 <-> 2 
    [['1', '2'], ['1', '0', '2']] 
    1 <-> 0 
    [['1', '0']] 
    0 <-> 2 
    [['0', '2']] 

all paths: [['1', '2'], ['1', '0', '2'], ['1', '0'], ['0', '2']]

來源

2017-01-08 20:26:37 martineau

謝謝！這適用於小型csv文件。我嘗試將它應用於具有2000行的csv文件，並且它無限運行。有沒有辦法更有效地做到這一點？ –

不客氣。我注意到，由於我在'main（）'的最後'for'循環中進行了打印，所以'getAllSimplePaths（）'函數對於同一對節點被稱爲_twice_。我已經從我的答案中刪除了這個，這會讓它更快。不知道你的2000行csv文件有多少，儘管... – martineau

我已經做了一些更多的優化來加速它。如果你想提供2000行csv文件（通過下載鏈接），我會看看是否還有更多我可以做的 - 假設它仍然太慢。 – martineau

查找CSV文件中所有可能的路徑

回答

相關問題