從頁面內容字典創建層次樹

以下鍵值對是'頁面'和'頁面內容'。從頁面內容字典創建層次樹

{ 
    'section-a.html':{'contents':'section-b.html section-c.html section-d.html'}, 
    'section-b.html':{'contents':'section-d.html section-e.html'}, 
    'section-c.html':{'contents':'product-a.html product-b.html product-c.html product-d.html'}, 
    'section-d.html':{'contents':'product-a.html product-c.html'}, 
    'section-e.html':{'contents':'product-b.html product-d.html'}, 
    'product-a.html':{'contents':''}, 
    'product-b.html':{'contents':''}, 
    'product-c.html':{'contents':''}, 
    'product-d.html':{'contents':''} 
}

對於任何給定的'物品'，我怎麼能找到所述物品的路徑？由於我在大多數情況下對數據結構的知識非常有限，我假設這將是一個層次樹。如果我錯了，請糾正我的錯誤！

更新：我的歉意，我應該更清楚數據和我的預期結果。

假設'page-a'是一個索引，每個'page'實際上是一個頁面出現在網站上，其中每個'item'類似於產品頁面，會出現在Amazon，Newegg等上。

因此，我對'item-d'的預期輸出將是該項目的一個路徑（或多個路徑）。例如（定界符是任意的，爲了說明在這裏）：項-d具有以下路徑：

page-a > page-b > page-e > item-d 
page-a > page-c > item-d

UPDATE2：更新我的原始dict以提供更精確的和真實的數據。 '.html'補充說明。

來源

2009-11-27 gibson

我真的不知道你的意思。你能舉個例子嗎？爲什麼你使用字符串作爲內容，而不是列表['item-a'，'item-b']？ – extraneon 2009-11-27 17:02:59

字符串是數據如何提供給我的。因爲我可以使用。split（）在一個字符串上，我不認爲我必須使用一個列表。舉一個例子：我幾乎只是尋找一個網站地圖（各種）。如果這沒有幫助，我可以嘗試給你真實的數據。 – gibson 2009-11-27 17:09:48

'item-a'會返回什麼？它似乎是在頁面c和頁面d？ – nosklo 2009-11-27 17:16:40

下面是一個簡單的方法 - 它是O（N平方），因此，並非所有這些高度可擴展的，但將爲您合理的書容量（如果你有，例如，數百萬頁，你需要思考一個非常不同的，不那麼簡單的方法;-)。

首先，使成爲更可用字典，映射頁面設置的內容：例如，如果原始字典是d，使另一字典mud爲：

mud = dict((p, set(d[p]['contents'].split())) for p in d)

然後，使字典映射每一頁其父網頁：

parent = dict((p, [k for k in mud if p in mud[k]]) for p in mud)

在這裏，我使用的是父頁面的列表（套將被罰款過），但是這與0或1頁的父母在你的榜樣，太行 - 你會只是使用一個空列表來表示「沒有父母」，否則就是一個列表作爲唯一的項目。這應該是一個非循環的有向圖（如果你有疑問，當然可以檢查，但我忽略了這個檢查）。

現在，給定一個頁面，找到其父母的路徑到一個無父母父母（「根頁面」）只需要「走」parent字典。例如，在0/1父案：

path = [page] 
while parent[path[-1]]: 
    path.append(parent[path[-1]][0])

如果你能更好地闡明你的規格（每書頁，每頁家長的數量，等等數範圍），這個代碼可以毫無疑問的是精煉，但作爲一個開始，我希望它可以提供幫助。

編輯：作爲OP澄清，用> 1個父的情況下（因此，多路徑）是確實的興趣，讓我告訴你怎麼處理是：

partial_paths = [ [page] ] 
while partial_paths: 
    path = partial_paths.pop() 
    if parent[path[-1]]: 
    # add as many partial paths as open from here 
    for p in parent[path[-1]]: 
     partial_paths.append(path + [p]) 
    else: 
    # we've reached a root (parentless node) 
    print(path)

當然，代替當它到達一個根（使其身體成爲一個生成器的函數）時，您可以yield每個路徑，或者以任何您需要的方式處理它。

再次編輯：評論者擔心圖中的週期。如果擔心這種擔心，跟蹤已經在路徑中看到的節點並檢測並警告任何週期並不困難。最快的是保持（在列表中，我們需要的列表排序，但檢查的成員爲O（1）在集合VS O（N）），一組沿着每個列表代表一個部分路徑：

partial_paths = [ ([page], set([page])) ] 
while partial_paths: 
    path, pset = partial_paths.pop() 
    if parent[path[-1]]: 
    # add as many partial paths as open from here 
    for p in parent[path[-1]]: 
     if p in pset: 
     print('Cycle: %s (%s)' % (path, p)) 
     continue 
     partial_paths.append((path + [p], pset.union([p]))) 
    else: 
    # we've reached a root (parentless node) 
    print('Path: %s' % (path,))

這可能值得一提的是，爲了清晰起見，將列表打包並用合適的方法將部分路徑設置爲小實用程序類路徑。

來源

2009-11-27 17:11:50

謝謝您的回答。我通過更多的澄清更新了我的原始問題。您的代碼仍然適用，除非它似乎沒有考慮到多個路徑。 – gibson 2009-11-27 18:20:01

@gibson，tx澄清 - 讓我展開答案，展示如何處理> 1父級的節點。 – 2009-11-28 00:18:20

亞歷克斯，你爲什麼認爲該圖將是非循環的？還是你的意思是「應該」，「這應該是非循環的，否則會有問題」？ – 2009-11-28 20:33:50

編輯隨着問題解釋好一點，我認爲以下可能是你需要的，或至少可以提供一些起點。

data = { 
    'section-a.html':{'contents':'section-b.html section-c.html section-d.html'}, 
    'section-b.html':{'contents':'section-d.html section-e.html'}, 
    'section-c.html':{'contents':\ 
        'product-a.html product-b.html product-c.html product-d.html'}, 
    'section-d.html':{'contents':'product-a.html product-c.html'}, 
    'section-e.html':{'contents':'product-b.html product-d.html'}, 
    'product-a.html':{'contents':''}, 
    'product-b.html':{'contents':''}, 
    'product-c.html':{'contents':''}, 
    'product-d.html':{'contents':''} 
} 

def findSingleItemInData(item): 
    return map(lambda x: (item, x), \ 
       [key for key in data if data[key]['contents'].find(item) <> -1]) 

def trace(text): 
    searchResult = findSingleItemInData(text) 
    if not searchResult: 
     return text 

    retval = [] 
    for item in searchResult: 
     retval.append([text, trace(item[-1])]) 

    return retval 

print trace('product-d.html')

OLD

我真的不知道，你希望看到什麼，但也許像這會工作。

data = { 
    'page-a':{'contents':'page-b page-c'}, 
    'page-b':{'contents':'page-d page-e'}, 
    'page-c':{'contents':'item-a item-b item-c item-d'}, 
    'page-d':{'contents':'item-a item-c'}, 
    'page-e':{'contents':'item-b item-d'} 
} 

itemToFind = 'item-c' 

for key in data: 
    for index, item in enumerate(data[key]['contents'].split()): 
    if item == itemToFind: 
     print key, 'at position', index

它會更容易，我覺得更多的是正確的，如果你使用略有不同數據結構：

data = { 
    'page-a':{'contents':['page-b', 'page-c']}, 
    'page-b':{'contents':['page-d', 'page-e']}, 
    'page-c':{'contents':['item-a', 'item-b item-c item-d']}, 
    'page-d':{'contents':['item-a', 'item-c']}, 
    'page-e':{'contents':['item-b', 'item-d']} 
}

然後，你就不會需要拆分。

鑑於去年的情況下，它甚至可以表達短一點：

for key in data: 
    print [ (key, index, value) for index,value in \ 
      enumerate(data[key]['contents']) if value == 'item-c' ]

甚至更短，與空列表中刪除：

print filter(None, [[ (key, index, value) for index,value in \ 
     enumerate(data[key]['contents']) if value == 'item-c' ] for key in data])

這應該是一個單一的線，但我使用\作爲換行符指示符，因此它可以被讀取爲而不使用滾動條。

來源

2009-11-27 17:13:44 extraneon

非常感謝您的努力。你的第三和第四塊代碼似乎在正確的軌道上，但是它們沒有提供任何有用的輸出。塊＃3不輸出所有空列表和＃4輸出。 – gibson 2009-11-27 17:41:03

您是否在它之前添加了打印以查看生成的內容？ – extraneon 2009-11-27 20:27:59

我做到了。我發現的錯誤是在我自己的代碼中。我忘了在'內容'中使用列表而不是空格分隔。愚蠢的錯誤。但是，此輸出似乎不提供任何給定頁面的路徑。正如我的（現在更新，澄清）問題所述，我正在尋找每個頁面的「路徑」（頁面位於網站上）。 – gibson 2009-11-27 21:29:19

下面是您的問題的圖解。當你有圖片時，更容易推理圖。

首先，縮寫數據：

#!/usr/bin/perl -pe 
s/section-([a-e])\.html/uc$1/eg; s/product-([a-e])\.html/$1/g

結果：

# graph as adj list 
DATA = { 
    'A':{'contents':'B C D'}, 
    'B':{'contents':'D E'}, 
    'C':{'contents':'a b c d'}, 
    'D':{'contents':'a c'}, 
    'E':{'contents':'b d'}, 
    'a':{'contents':''}, 
    'b':{'contents':''}, 
    'c':{'contents':''}, 
    'd':{'contents':''} 
}

轉換爲的Graphviz的格式：

with open('data.dot', 'w') as f: 
    print >> f, 'digraph {' 
    for node, v in data.iteritems(): 
     for child in v['contents'].split(): 
      print >> f, '%s -> %s;' % (node, child), 
     if v['contents']: # don't print empty lines 
      print >> f 
    print >> f, '}'

結果：

digraph { 
A -> C; A -> B; A -> D; 
C -> a; C -> b; C -> c; C -> d; 
B -> E; B -> D; 
E -> b; E -> d; 
D -> a; D -> c; 
}

繪製圖表：

 
$ dot -Tpng -O data.dot

data.dot

來源

2009-11-28 15:30:15 jfs

從頁面內容字典創建層次樹

回答

相關問題