2017-05-17 95 views
0

我試圖從下面的小列表中排除不包含特定POS標籤的列表,但不能這樣做。如何篩選不包含其他列表中的元素的列表?

a = ['VBG', 'RB', 'NNP'] 

我只希望它含有元組的列表的下方列表上方的標籤在輸出列表: (以下標籤可能是不正確的,但表示用途)

data = [[('User', 'NNP'), 
     ('is', 'VBG'), 
     ('not', 'RB'), 
     ('able', 'JJ'), 
     ('to', 'TO'), 
     ('order', 'NN'), 
     ('products', 'NNS'), 
     ('from', 'IN'), 
     ('iShopCatalog', 'NN'), 
     ('Coala', 'NNP'), 
     ('excluding', 'VBG'), 
     ('articles', 'NNS'), 
     ('from', 'IN'), 
     ('VWR', 'NNP')], 
    [('Arfter', 'NNP'), 
     ('transferring', 'VBG'), 
     ('the', 'DT'), 
     ('articles', 'NNS'), 
     ('from', 'IN'), 
     ('COALA', 'NNP'), 
     ('to', 'TO'), 
     ('SRM', 'VB'), 
     ('the', 'DT'), 
     ('Category', 'NNP'), 
     ('S9901', 'NNP'), 
     ('Dummy', 'NNP'), 
     ('is', 'VBZ'), 
     ('maintained', 'VBN')], 
    [('Due', 'JJ'), 
     ('to', 'TO'), 
     ('this', 'DT'), 
     ('the', 'DT'), 
     ('user', 'NN'), 
     ('is', 'VBZ'), 
     ('not', 'RB'), 
     ('able', 'JJ'), 
     ('to', 'TO'), 
     ('order', 'NN'), 
     ('the', 'DT'), 
     ('product', 'NN')], 
    [('All', 'DT'), 
     ('other', 'JJ'), 
     ('users', 'NNS'), 
     ('can', 'MD'), 
     ('order', 'NN'), 
     ('these', 'DT'), 
     ('articles', 'NNS')], 
    [('She', 'PRP'), 
     ('can', 'MD'), 
     ('order', 'NN'), 
     ('other', 'JJ'), 
     ('products', 'NNS'), 
     ('from', 'IN'), 
     ('a', 'DT'), 
     ('POETcatalog', 'NNP'), 
     ('without', 'IN'), 
     ('any', 'DT'), 
     ('problems', 'NNS')], 
    [('Furtheremore', 'IN'), 
     ('she', 'PRP'), 
     ('is', 'VBZ'), 
     ('able', 'JJ'), 
     ('to', 'TO'), 
     ('order', 'NN'), 
     ('products', 'NNS'), 
     ('from', 'IN'), 
     ('the', 'DT'), 
     ('Vendor', 'NNP'), 
     ('VWR', 'NNP'), 
     ('through', 'IN'), 
     ('COALA', 'NNP')], 
    [('But', 'CC'), 
     ('articles', 'NNP'), 
     ('from', 'VBG'), 
     ('all', 'RB'), 
     ('other', 'JJ'), 
     ('suppliers', 'NNS'), 
     ('are', 'NNP'), 
     ('not', 'VBG'), 
     ('orderable', 'RB')], 
    [('I', 'PRP'), 
     ('already', 'RB'), 
     ('spoke', 'VBD'), 
     ('to', 'TO'), 
     ('anic', 'VB'), 
     ('who', 'WP'), 
     ('maintain', 'VBP'), 
     ('the', 'DT'), 
     ('catalog', 'NN'), 
     ('COALA', 'NNP'), 
     ('and', 'CC'), 
     ('they', 'PRP'), 
     ('said', 'VBD'), 
     ('that', 'IN'), 
     ('the', 'DT'), 
     ('reason', 'NN'), 
     ('should', 'MD'), 
     ('be', 'VB'), 
     ('the', 'DT'), 
     ('assignment', 'NN'), 
     ('of', 'IN'), 
     ('the', 'DT'), 
     ('plant', 'NN')], 
    [('User', 'NNP'), 
     ('is', 'VBZ'), 
     ('a', 'DT'), 
     ('assinged', 'JJ'), 
     ('to', 'TO'), 
     ('Universitaet', 'NNP'), 
     ('Regensburg', 'NNP'), 
     ('in', 'IN'), 
     ('Scout', 'NNP'), 
     ('but', 'CC'), 
     ('in', 'IN'), 
     ('P17', 'NNP'), 
     ('table', 'NN'), 
     ('YESRMCDMUSER01', 'NNP'), 
     ('she', 'PRP'), 
     ('is', 'VBZ'), 
     ('assigned', 'VBN'), 
     ('to', 'TO'), 
     ('company', 'NN'), 
     ('001500', 'CD'), 
     ('Merck', 'NNP'), 
     ('KGaA', 'NNP')], 
    [('Please', 'NNP'), 
     ('find', 'VB'), 
     ('attached', 'JJ'), 
     ('some', 'DT'), 
     ('screenshots', 'NNS')]] 

我的預期輸出是:

data = [[('User', 'NNP'), 
    ('is', 'VBG'), 
    ('not', 'RB'), 
    ('able', 'JJ'), 
    ('to', 'TO'), 
    ('order', 'NN'), 
    ('products', 'NNS'), 
    ('from', 'IN'), 
    ('iShopCatalog', 'NN'), 
    ('Coala', 'NNP'), 
    ('excluding', 'VBG'), 
    ('articles', 'NNS'), 
    ('from', 'IN'), 
    ('VWR', 'NNP')], 
    [('But', 'CC'), 
    ('articles', 'NNP'), 
    ('from', 'VBG'), 
    ('all', 'RB'), 
    ('other', 'JJ'), 
    ('suppliers', 'NNS'), 
    ('are', 'NNP'), 
    ('not', 'VBG'), 
    ('orderable', 'RB')] 

我試圖通過編寫下面的代碼要做到這一點,但未能如願:

list1=[] 
for i in data: 
    list2 = [] 
    a = ['VBG', 'RB', 'NNP'] 
    for j in i: 
     if all(i in j[1] for i in a): 
      list2.append(j) 
    list1.append(list2) 
list1 

這是返回列表的空列表。 任何人都可以提供一個簡單易懂的代碼來獲得我的預期輸出。謝謝。

回答

2

你的條件在這裏:

if all(i in j[1] for i in a): 

是要求在標籤,如果所有j[1],然後追加僅該項目!但最多一個將(給出你的數據),這就是爲什麼你得到一個空的列表。相反,你想:

In [32]: from operator import itemgetter 
    ...: list1=[] 
    ...: a = ['VBG', 'RB', 'NNP'] 
    ...: for sub in data: 
    ...:  tags = set(map(itemgetter(1), sub)) 
    ...:  if all(s in tags for s in a): 
    ...:   list1.append(sub) 
    ...: 

此檢查*所有a的項目是在一套tags形式的子表...

In [33]: list1 
Out[33]: 
[[('User', 'NNP'), 
    ('is', 'VBG'), 
    ('not', 'RB'), 
    ('able', 'JJ'), 
    ('to', 'TO'), 
    ('order', 'NN'), 
    ('products', 'NNS'), 
    ('from', 'IN'), 
    ('iShopCatalog', 'NN'), 
    ('Coala', 'NNP'), 
    ('excluding', 'VBG'), 
    ('articles', 'NNS'), 
    ('from', 'IN'), 
    ('VWR', 'NNP')], 
[('But', 'CC'), 
    ('articles', 'NNP'), 
    ('from', 'VBG'), 
    ('all', 'RB'), 
    ('other', 'JJ'), 
    ('suppliers', 'NNS'), 
    ('are', 'NNP'), 
    ('not', 'VBG'), 
    ('orderable', 'RB')]] 
+0

@DYZ謝謝,從我原來的答案剩下的。 –

1

這種解決方案可能看起來完全怪異,但它的工作原理:

a = set(a) 
def match(x): 
    words,tags = zip(*x) 
    return set(tags) & a == a 
list(filter(match,data)) 
#[[('User', 'NNP'), ('is', 'VBG'), ('not', 'RB'), ('Coala', 'NNP'), 
# ('excluding', 'VBG'), ('VWR', 'NNP')], [('Arfter', 'NNP'),  
# ('transferring', 'VBG'), ('COALA', 'NNP'), ('Category', 'NNP'), 
# ('S9901', 'NNP'), ('Dummy', 'NNP')], [('not', 'RB')], [], 
# [('POETcatalog', 'NNP')], [('Vendor', 'NNP'), ('VWR', 'NNP'), 
# ('COALA', 'NNP')], [('articles', 'NNP'), ('from', 'VBG'), ('all', 'RB'), 
# ('are', 'NNP'), ('not', 'VBG'), ('orderable', 'RB')], [('already', 'RB'), 
# ('COALA', 'NNP')], [('User', 'NNP'), ('Universitaet', 'NNP'), 
# ('Regensburg', 'NNP'), ('Scout', 'NNP'), ('P17', 'NNP'), 
# ('YESRMCDMUSER01', 'NNP'), ('Merck', 'NNP'), ('KGaA', 'NNP')], 
# [('Please', 'NNP')]] 
+0

我以爲OP首先想要的,但仔細看看所需的輸出! –

+0

對。改變它更瘋狂的東西:) – DyZ

+1

嘿,很好!我會使用'set(map(itemgetter(1),x))'而不是'set(dict(x).values())',這有點不那麼瘋狂了:),在Python 3中,只需使用'dict(x).values()',因爲values-view支持設置操作! –

相關問題