在元組列表上執行設置的操作差異

我想獲得2個容器之間的差異，但容器是在一個奇怪的結構，所以我不知道什麼是最好的方式來執行它的差異。一個容器類型和結構我不能改變，但我可以改變其他的（可變分界）。在元組列表上執行設置的操作差異

delims = ['on','with','to','and','in','the','from','or'] 
words = collections.Counter(s.split()).most_common() 
# words results in [("the",2), ("a",9), ("diplomacy", 1)] 

#I want to perform a 'difference' operation on words to remove all the delims words 
descriptive_words = set(words) - set(delims) 

# because of the unqiue structure of words(list of tuples) its hard to perform a difference 
# on it. What would be the best way to perform a difference? Maybe... 

delims = [('on',0),('with',0),('to',0),('and',0),('in',0),('the',0),('from',0),('or',0)] 
words = collections.Counter(s.split()).most_common() 
descriptive_words = set(words) - set(delims) 

# Or maybe 
words = collections.Counter(s.split()).most_common() 
n_words = [] 
for w in words: 
    n_words.append(w[0]) 
delims = ['on','with','to','and','in','the','from','or'] 
descriptive_words = set(n_words) - set(delims)

來源

2012-03-29 Jake M

如何只通過刪除所有的分隔符修改words？

words = collections.Counter(s.split()) 
for delim in delims: 
    del words[delim]

來源

2012-03-29 09:44:01

看起來有效我想我會用它，但單詞是元組列表我怎麼能說「單詞[delim]」？ – 2012-03-29 09:45:53

@JakeM - 將其直接應用於Counter對象。 – eumiro 2012-03-29 09:48:38

啊，我在想詞是Counter對象 – 2012-03-29 09:49:03

這是我我會怎麼做：

delims = set(['on','with','to','and','in','the','from','or']) 
# ... 
descriptive_words = filter(lamdba x: x[0] not in delims, words)

使用過濾器的方法。一個可行的替代辦法是：

delims = set(['on','with','to','and','in','the','from','or']) 
# ... 
decsriptive_words = [ (word, count) for word,count in words if word not in delims ]

確保該delims是一組允許O(1) lookup。

來源

2012-03-29 09:41:18 brice

第一種方法使用'in'，這是否意味着我們正在遍歷整個分隔符的每個比較？ – 2012-03-29 09:48:44

如果他們是集合或字典，則不是。 O（1）查找，[文檔說]（http://wiki.python.org/moin/TimeComplexity）。 – brice 2012-03-29 09:51:25

如果你正在迭代它，爲什麼還要把它們轉換爲集？

dwords = [delim[0] for delim in delims] 
words = [word for word in words if word[0] not in dwords]

來源

2012-03-29 09:42:08

@Rob年輕是的，我試圖避免迭代他們的效率。任何不重複的解決方案是最好的，我認爲 – 2012-03-29 09:47:19

壞主意。這將是O（n^2），不是嗎？ – brice 2012-03-29 09:50:13

出於性能考慮，您可以使用拉姆達功能

filter(lambda word: word[0] not in delim, words)

來源

2012-03-29 09:51:16 FallenAngel

過濾器+ lambda比列表理解的可讀性差，列表理解可以[通常更快]（http://wiki.python.org/moin/PythonSpeed/PerformanceTips#循環）。 – 2012-03-29 10:11:49

其次，由於delims是一個列表，所以它仍然在做O（n^2）。 – brice 2012-03-29 10:27:35

最簡單的答案是做：

import collections 

s = "the a a a a the a a a a a diplomacy" 
delims = {'on','with','to','and','in','the','from','or'} 
// For older versions of python without set literals: 
// delims = set(['on','with','to','and','in','the','from','or']) 
words = collections.Counter(s.split()) 

not_delims = {key: value for (key, value) in words.items() if key not in delims} 
// For older versions of python without dict comprehensions: 
// not_delims = dict(((key, value) for (key, value) in words.items() if key not in delims))

這給了我們：

{'a': 9, 'diplomacy': 1}

另外一種方式就是去做先發制人：

import collections 

s = "the a a a a the a a a a a diplomacy" 
delims = {'on','with','to','and','in','the','from','or'} 
counted_words = collections.Counter((word for word in s.split() if word not in delims))

在這裏，您申請的單詞列表過濾你把它交給櫃檯前，這給了相同的結果。

來源

2012-03-29 10:02:20

在元組列表上執行設置的操作差異

回答

相關問題