2013-07-25 225 views
0

我有這樣的文件夾列表:查找父文件夾

u'Magazines/testfolder1', 
u'Magazines/testfolder1/folder1/folder2/folder3', 
u'Magazines/testfolder1/folder1/', 
u'Magazines/testfolder1/folder1/folder2/', 
u'Magazines/testfolder2', 
u'Magazines/testfolder2/folder1/folder2/folder3', 
u'Magazines/testfolder2/folder1/', 
u'Magazines/testfolder2/folder1/folder2/', 
u'Magazines/testfolder3', 
u'Magazines/testfolder3/folder1/folder2/folder3', 
u'Magazines/testfolder3/folder1/', 
u'Magazines/testfolder3/folder1/folder2/', 

現在,我要的是唯一的父文件夾列表中。

即在上面的例子中我想,要減少:

u'Magazines/testfolder1', 
u'Magazines/testfolder2', 
u'Magazines/testfolder3', 

,因爲它們都包含子文件夾。

我在我的數據庫中遞歸添加文件夾,所以如果我有testfolder1那麼腳本會自動遞歸其子文件夾。所以我不需要列表中的子文件夾,如果他們的父母也在列表中。

我該怎麼做?

回答

2

使用set

>>> list_of_folders = [ 
...  u'Magazines/testfolder1', 
...  u'Magazines/testfolder1/folder1/folder2/folder3', 
...  u'Magazines/testfolder1/folder1/', 
...  u'Magazines/testfolder1/folder1/folder2/', 
...  u'Magazines/testfolder2', 
...  u'Magazines/testfolder2/folder1/folder2/folder3', 
...  u'Magazines/testfolder2/folder1/', 
...  u'Magazines/testfolder2/folder1/folder2/', 
...  u'Magazines/testfolder3', 
...  u'Magazines/testfolder3/folder1/folder2/folder3', 
...  u'Magazines/testfolder3/folder1/', 
...  u'Magazines/testfolder3/folder1/folder2/', 
... ] 
>>> result = set() 
>>> for folder in list_of_folders: 
...  for parent in result: 
...   if folder.startswith(parent): 
...    break 
...  else: 
...   result.add(folder) 
... 
>>> result 
{'Magazines/testfolder3', 'Magazines/testfolder2', 'Magazines/testfolder1'} 

UPDATE

list_of_folders = [ 
    ... 
] 
result = set() 
for folder in list_of_folders: 
    if all(not folder.startswith(parent) for parent in result): 
     result.add(folder) 
print result 
+0

我得到空集,我在這裏做了小腳本http://codepad.org/CRGLJC4R。你可以看看 – fdsgds

+1

Outdent最後的「else」,它與「for」匹配,而不是與「if」匹配。並確保您閱讀Python文檔以瞭解其工作原理。 –

+0

@UlrichEckhardt,謝謝,但如果父文件夾不在頂部,我不會得到期望的結果。我的意思是如果你把'u'Magazines/testfolder1','在底部然後結果是不同的e,g這個http://codepad.org/0Te9aEmK – fdsgds

0

怎麼樣使用regular expression

import re 

l = [ 
    u'Magazines/testfolder1', 
    u'Magazines/testfolder1/folder1/folder2/folder3', 
    u'Magazines/testfolder1/folder1/', 
    u'Magazines/testfolder1/folder1/folder2/', 
    u'Magazines/testfolder2', 
    u'Magazines/testfolder2/folder1/folder2/folder3', 
    u'Magazines/testfolder2/folder1/', 
    u'Magazines/testfolder2/folder1/folder2/', 
    u'Magazines/testfolder3', 
    u'Magazines/testfolder3/folder1/folder2/folder3', 
    u'Magazines/testfolder3/folder1/', 
    u'Magazines/testfolder3/folder1/folder2/', 
] 

expect = [ 
    u'Magazines/testfolder1', 
    u'Magazines/testfolder2', 
    u'Magazines/testfolder3', 
] 

result = filter(lambda x: re.match('^[^\/]+\/[^\/]+$', x), l) 

assert expect == result 
0

伴侶我下面beleive是你正在尋找

lst = [ 
u'Magazines/testfolder1', 
u'Magazines/testfolder1/folder1/folder2/folder3', 
u'Magazines/testfolder1/folder1/', 
u'Magazines/testfolder1/folder1/folder2/', 
u'Magazines/testfolder2', 
u'Magazines/testfolder2/folder1/folder2/folder3', 
u'Magazines/testfolder2/folder1/', 
u'Magazines/testfolder2/folder1/folder2/', 
u'Magazines/testfolder3', 
u'Magazines/testfolder3/folder1/folder2/folder3', 
u'Magazines/testfolder3/folder1/', 
u'Magazines/testfolder3/folder1/folder2/' 
] 

    for x in lst: 
     for y in lst[:]: 
      if x in y and len(x)<len(y): 
       lst.remove(y) 
    print lst 

輸出

[u'Magazines/testfolder1', u'Magazines/testfolder2', u'Magazines/testfolder3'] 

這個程序反覆將刪除列表中的子文件夾的解決方案,徒留父夾。

0
l =[u'Magazines/testfolder1', 
    u'Magazines/testfolder1/folder1/folder2/folder3', 
    u'Magazines/testfolder1/folder1/', 
    u'Magazines/testfolder1/folder1/folder2/', 
    u'Magazines/testfolder2', 
    u'Magazines/testfolder2/folder1/folder2/folder3', 
    u'Magazines/testfolder2/folder1/', 
    u'Magazines/testfolder2/folder1/folder2/', 
    u'Magazines/testfolder3', 
    u'Magazines/testfolder3/folder1/folder2/folder3', 
    u'Magazines/testfolder3/folder1/', 
    u'Magazines/testfolder3/folder1/folder2/', ] 

mincount = min(s.count('/') for s in l) 
[d for d in sorted(l) if d.count('/') <= mincount] 
#=> [u'Magazines/testfolder1', u'Magazines/testfolder2', u'Magazines/testfolder3'] 

它不是非常聰明,但它有效的地方有一個共同的根。

+0

您認爲父文件夾只包含一個'/'或不包含'/'。如果父母文件夾是「a/b/c/d''? – falsetru

+0

@falsetru這裏的父文件夾實際上是'Magazines'。無論如何,這不是一項連貫的任務。如果真的困擾你,你可以找到最小數量的分隔符,並將其用作計數。 – Marcin

+0

如果'l'是'['a','a/b','c/d']',你將得到'['a']'。但它應該是'['a','c/d']'。 – falsetru