2017-09-09 72 views
1

我試圖從一個嵌套列表看起來像這樣刪除重複的子列表:刪除嵌套表副本(不除去子列表重複元素)

result_set = [ 
    ['MEMS', 'MEMS', 'MEMS', 'MEMS'], 
    ['Microfluidics', 'Microfluidics', 'Microfluidics', 'Microfluidics', 'Microfluidics', 'Microfluidics', 'Microfluidics'], 
    ['Microfabrication', 'Microfabrication', 'Microfabrication', 'Clean-Room Microfabrication', 'Microfabrication', 'Microfabrication'], 
    ['Photolithography', 'Photolithography', 'Lithography', 'Photolithography'], 
    ['MEMS', 'MEMS', 'MEMS', 'MEMS'] 
    ] 

我想輸出如下:

result_set = [ 
    ['MEMS', 'MEMS', 'MEMS', 'MEMS'], 
    ['Microfluidics', 'Microfluidics', 'Microfluidics', 'Microfluidics', 'Microfluidics', 'Microfluidics', 'Microfluidics'], 
    ['Microfabrication', 'Microfabrication', 'Microfabrication', 'Clean-Room Microfabrication', 'Microfabrication', 'Microfabrication'], 
    ['Photolithography', 'Photolithography', 'Lithography', 'Photolithography'] 
    ] 

請注意,基本上最後一個元素['MEMS','MEMS','MEMS','MEMS']不再存在。 Similar questions一直在問我從那裏改編下面的代碼:

result_set = set(frozenset(x) for x in result) 
lst = [list(x) for x in result_set] 

我的問題是,我得到以下輸出:

result_set = [['MEMS'], ['Microfluidics'], ['Microfabrication', 'Clean-Room Microfabrication'], ['Photolithography', 'Lithography']] 

注意到它還會刪除子表中的重複元素。我不想要這個,因爲我之後的目標是繪製直方圖。比如說 - > MEMS有4次發生。因此,我想跟蹤每個子列表最初的元素數量。

+1

如果你的問題得到回答,你應該[接受](https://stackoverflow.com/help/someone-answers),幫助大部分的答案。 –

回答

3

如果順序並不重要,你可以使用一個set

final_data = list(map(list, set(map(tuple, result_set)))) 

輸出:

[['Microfabrication', 'Microfabrication', 'Microfabrication', 'Clean-Room Microfabrication', 'Microfabrication', 'Microfabrication'], ['Microfluidics', 'Microfluidics', 'Microfluidics', 'Microfluidics', 'Microfluidics', 'Microfluidics', 'Microfluidics'], ['Photolithography', 'Photolithography', 'Lithography', 'Photolithography'], ['MEMS', 'MEMS', 'MEMS', 'MEMS']] 

如果爲了事情呢,你可以試試這個:

final_data = [] 
for result in result_set: 
    if result not in final_data: 
     final_data.append(result) 

輸出:

[['MEMS', 'MEMS', 'MEMS', 'MEMS'], ['Microfluidics', 'Microfluidics', 'Microfluidics', 'Microfluidics', 'Microfluidics', 'Microfluidics', 'Microfluidics'], ['Microfabrication', 'Microfabrication', 'Microfabrication', 'Clean-Room Microfabrication', 'Microfabrication', 'Microfabrication'], ['Photolithography', 'Photolithography', 'Lithography', 'Photolithography']] 
0

使用collections.OrderedDict來重新訓練獨特項目的順序。

from collections import OrderedDict 

out = list(
      map(
       list, OrderedDict.fromkeys(map(tuple, result_set)).keys() 
      ) 
    ) 
print(out) 

[['MEMS', 'MEMS', 'MEMS', 'MEMS'], 
['Microfluidics', 
    'Microfluidics', 
    'Microfluidics', 
    'Microfluidics', 
    'Microfluidics', 
    'Microfluidics', 
    'Microfluidics'], 
['Microfabrication', 
    'Microfabrication', 
    'Microfabrication', 
    'Clean-Room Microfabrication', 
    'Microfabrication', 
    'Microfabrication'], 
['Photolithography', 'Photolithography', 'Lithography', 'Photolithography']] 
0

排序列表,然後使用itertools.groupby()生成的密鑰創建一個新列表。

import itertools 
result_set.sort() 
new_set = [k for k,g in itertools.groupby(result_set)]