2010-10-08 27 views
3

提取數據I有一個列表象下面這樣:優化過濾邏輯Python的方式/從列表

['1 (UID 3234 FLAGS (seen \\Seen))', '2 (UID 3235 FLAGS (\\Seen))', 
'3 (UID 3236 FLAGS (\\Deleted))', '4 (UID 3237 FLAGS (-FLAGS \\Seen +FLAGS))', 
'5 (UID 3241 FLAGS (-FLAGS \\Seen +FLAGS))', '6 (UID 3242 FLAGS (\\Seen))', 
'7 (UID 3243 FLAGS (\\Seen))', '8 (UID 3244 FLAGS (\\Seen))', 
'9 (UID 3245 FLAGS (\\Seen))', '10 (UID 3247 FLAGS (\\Seen))', 
'11 (UID 3252 FLAGS (\\Seen))', '12 (UID 3253 FLAGS (\\Deleted))', 
'13 (UID 3254 FLAGS())', '14 (UID 3256 FLAGS (\\Seen))', '15 (UID 3304 FLAGS())', 
'16 (UID 3318 FLAGS (\\Seen))', '17 (UID 3430 FLAGS (\\Seen))', 
'18 (UID 3431 FLAGS())', '19 (UID 3434 FLAGS (\\Seen))', 
'20 (UID 3447 FLAGS (-FLAGS \\Seen +FLAGS))', '21 (UID 3478 FLAGS())', 
'22 (UID 3479 FLAGS())', '23 (UID 3480 FLAGS())', '24 (UID 3481 FLAGS())'] 

從這個列表,我想三種不同的列表作爲結果。我想要在列表中使用單個迭代的結果。的

  1. 列表中的所有的uid即[3234,3235,3236,3237,3241 ......]的看的UID
  2. 列表即[3234,3235 ...] < - 項目的UID已經\看國旗
  3. 刪除的UID
  4. 列表即[3236,3253] < - 項目的UID具有\刪除標誌
+2

「+ FLAGS」和「-FLAGS」的意義是什麼? – PaulMcG 2010-10-08 09:23:06

+0

「FLAGS(看到\\見過」)是什麼意思(在入口#1)? – PaulMcG 2010-10-08 09:27:35

+0

你到目前爲止有什麼? – SilentGhost 2010-10-08 09:50:23

回答

3

做是把最好的東西你將數據映射到dict將UID映射到FLAGS,然後搜索它將很容易。因此,該數據將是這個樣子:

{'3254': '', '3304': '', '3236': '\\Deleted', '3237': '-FLAGS \\Seen +FLAGS', '3234': 'seen \\Seen', '3235': '\\Seen', '3430': '\\Seen', '3431': '', '3252': '\\Seen', '3253':'\\Deleted', '3478': '', '3479': '', '3256': '\\Seen', '3481': '', '3480': '', '3318': '\\Seen', '3434': '\\Seen', '3243': '\\Seen', '3242': '\\Seen', '3241': '-FLAGS \\Seen +FLAGS', '3247': '\\Seen', '3245': '\\Seen', '3244': '\\Seen', '3447': '-FLAGS \\Seen +FLAGS'} 

你可以做到這一點using a Regular Expression到列表中的每個條目相匹配。如果我們得到正則表達式返回比賽中的兩個組,我們可以輕鬆構建dict

所以我們最終是這樣的:

items = ['1 (UID 3234 FLAGS (seen \\Seen))', '2 (UID 3235 FLAGS (\\Seen))', '3 (UID 3236 FLAGS (\\Deleted))', '4 (UID 3237 FLAGS (-FLAGS \\Seen +FLAGS))', '5 (UID 3241 FLAGS (-FLAGS \\Seen +FLAGS))', '6 (UID 3242 FLAGS (\\Seen))', '7 (UID 3243 FLAGS (\\Seen))', '8 (UID 3244 FLAGS (\\Seen))', '9 (UID 3245 FLAGS (\\Seen))', '10 (UID 3247 FLAGS (\\Seen))', '11 (UID 3252 FLAGS (\\Seen))', '12 (UID 3253 FLAGS (\\Deleted))', '13 (UID 3254 FLAGS())', '14 (UID 3256 FLAGS (\\Seen))', '15 (UID 3304 FLAGS())', '16 (UID 3318 FLAGS (\\Seen))', '17 (UID 3430 FLAGS (\\Seen))', '18 (UID 3431 FLAGS())', '19 (UID 3434 FLAGS (\\Seen))', '20 (UID 3447 FLAGS (-FLAGS \\Seen +FLAGS))', '21 (UID 3478 FLAGS())', '22 (UID 3479 FLAGS())', '23 (UID 3480 FLAGS())', '24 (UID 3481 FLAGS())'] 

import re 
pattern = re.compile(r"\d+ \(UID (\d+) FLAGS \(([^)]*)\)\)") 
values = dict(pattern.match(item).groups() for item in items) 

然後我們就可以方便地查詢在values的項目得到你想要的東西:

print "All UIDs:",values.keys() 
print "Seen UIDs:",[uid for uid,flags in values.iteritems() if r"\Seen" in flags] 
print "Deleted UIDs:",[uid for uid,flags in values.iteritems() if r"\Deleted" in flags] 
+0

您是不是在迭代項目列表中多次以在您的解決方案中獲取「已看見」和「已刪除」? – 2010-10-08 09:11:34

+0

@Noufal Ibrahim - 是的。我假設列表不是很長,所以我重視可讀性而不是性能。 – 2010-10-08 13:40:42

+0

我完全同意你的方法。提問者要求進行一次迭代。這就是我提出的原因。 – 2010-10-08 14:37:06

1

我不知道關於列表解析因爲這些通常將一個列表映射到另一個列表(使用過濾或映射)。我沒有看到他們被用來拆分列表。但是,您可以在單次迭代中使用genexp和循環的組合來完成此操作。我已經吹了一下,以便清楚。

import re 
grepper = re.compile(r'[0-9]+ \(UID (?P<uid>[0-9]+) FLAGS (?P<flags>\(.*\))\)') 

t = [..] #your list 

items = (grepper.search(m).groupdict() for m in t) 

all = [] 
seen = [] 
deleted = [] 
for i in items: 
    if "Seen" in i: 
    seen.append(i["uid"]) 
    if "Deleted" in i: 
    deleted.append(i["uid"]) 
    all.append(i["uid"]) 

現在你應該有3個列表。

+0

您正在遍歷列表兩次:( – slezica 2010-10-08 09:21:30

+0

哪裏?[15個字符...] – 2010-10-08 09:59:39

+0

從技術上說,grepper.search然後是我在項目中。 – 2010-10-08 11:41:58

1
all,deleted,seen = [list(filter(None, a)) for a in \ 
    zip(*map(lambda a: (a[2], '\Deleted' in a[-1] and a[2], '\Seen' in a[-1] and a[2]), map(lambda a: a.split(' '), items)))] 

這將更快地使用重新或不重新 - 你需要檢查timeit!

+1

哦,男孩,我不確定我想在生產代碼中看到它。:) – 2010-10-08 15:43:01

+0

ohhh太多lambda flter地圖拉鍊..... :-) – shahjapan 2010-10-09 04:33:08

0
all=[] 
seen=[] 
deleted=[] 
for item in alist: 
    s=item.split(" ",4) 
    all.append(s[2]) 
    if "seen" in s[-1].lower(): 
     seen.append(s[2]) 
    elif "delete" in s[-1].lower(): 
     deleted.append(s[2]) 
0

我可以想到在一次迭代中做這件事的唯一方法就是生成你要求的三個列表,就是手動迭代。沒有我能想出的蟒蛇魔法。

如果您知道關於格式及其生成方式的詳細信息,則可以輕鬆改進此操作。例如,我不知道爲什麼+ FLAGS和-FLAGS在某些項目中,並且不知道何時會期望括號,所以我不得不使用find()。另外,我也剛剛拆分()將字符串兩種,不過話又說回來,我不知道什麼旗格式,則意味着,...

def parseList(l): 
    lall = [] 
    lseen = [] 
    ldeleted = [] 

    for item in l: 
     spl = item.split() 

     uid = int(spl[2]) 

     lall.append(uid) 

     for word in spl[4:]: 
      if word.find("\Seen") != -1: 
       lseen.append(uid) 

      elif word.find("\Deleted") != -1: 
       ldeleted.append(uid) 

    return lall, lseen, ldeleted 
2
import re 

data = ['1 (UID 3234 FLAGS (seen \\Seen))', '2 (UID 3235 FLAGS (\\Seen))', 
'3 (UID 3236 FLAGS (\\Deleted))', '4 (UID 3237 FLAGS (-FLAGS \\Seen +FLAGS))', 
'5 (UID 3241 FLAGS (-FLAGS \\Seen +FLAGS))', '6 (UID 3242 FLAGS (\\Seen))', 
'7 (UID 3243 FLAGS (\\Seen))', '8 (UID 3244 FLAGS (\\Seen))', 
'9 (UID 3245 FLAGS (\\Seen))', '10 (UID 3247 FLAGS (\\Seen))', 
'11 (UID 3252 FLAGS (\\Seen))', '12 (UID 3253 FLAGS (\\Deleted))', 
'13 (UID 3254 FLAGS())', '14 (UID 3256 FLAGS (\\Seen))', '15 (UID 3304 FLAGS())', 
'16 (UID 3318 FLAGS (\\Seen))', '17 (UID 3430 FLAGS (\\Seen))', 
'18 (UID 3431 FLAGS())', '19 (UID 3434 FLAGS (\\Seen))', 
'20 (UID 3447 FLAGS (-FLAGS \\Seen +FLAGS))', '21 (UID 3478 FLAGS())', 
'22 (UID 3479 FLAGS())', '23 (UID 3480 FLAGS())', '24 (UID 3481 FLAGS())'] 

r = re.compile('\d+\s\(UID\s(?P<uid>\d+)\sFLAGS\s\((?P<data>.*)\)\)') 
uid_list = [] 
seen_uid_list = [] 
deleted_uid_list = [] 
for s in data: 
    m = r.match(s) 
    if m: 
     uid_list.append(m.group('uid')) 
     if m.group('data').rfind('Seen') > 0: seen_uid_list.append(m.group('uid')) 
     if m.group('data').rfind('Deleted') > 0: deleted_uid_list.append(m.group('uid')) 

print uid_list 
print seen_uid_list 
print deleted_uid_list 
1

這一個適用於您的數據樣本...

uids, seen, deleted = [], [], [] 
for item in myList: 
    uids.append(int(item[7:12])) 
    if 'Se' in item[20:]: seen.append(uids[-1]) 
    elif 'De' in item[20:]: deleted.append(uids[-1])