在元組中按元素比較返回集的集合 - python

我是python的新手。有人幫我解決這個問題。我有一個數據集在第一行中的屬性和其餘行中的記錄。在元組中按元素比較返回集的集合 - python

我的要求是將每個記錄與其他記錄進行比較，並給出不同的元素的屬性名稱。所以最後，我應該有一套作爲輸出。

例如，如果我有3列這樣的3條記錄。

  Col1 Col2 Col3 
tuple1 H C G 
tuple2 H M G 
tuple3 L M S

它應該給我這個樣子tuple1，tuple2 = {col2的} tuple1，tuple3 = {Col1中，col2的，COL3} tuple2，tuple3 = {Col1中，COL3}

以及最終輸出應該是{{col2的}，{Col1中，col2的，COL3}，{Col1中，COL3}}

這裏是我已經嘗試了代碼，

我現在做的是，閱讀每一行到列表中。因此，一個列表中的所有屬性（列表名稱是list_attr）和行列表列表（列表名稱是行）。然後對於每條記錄，我正在循環其他記錄，比較每個元素並獲取不同元素的索引以獲取屬性名稱。然後最終轉換它們來設置。我已經給出了下面的代碼，但問題是，我有50k條記錄和15個屬性要處理，所以這個循環需要很長時間才能執行，還有其他方法可以很快完成此操作或提高性能。

dis_sets = [] 
for l in rows: 
    for l1 in rows: 
     if l != l1: 
      i = 0 
      in_sets = [] 
      while(i < length): 
       if l[i] != l1[i]: 
        in_sets.append(list_attr[i]) 
       i = i+1 
      if in_sets != []: 
       dis_sets.append(in_sets) 
skt = set(frozenset(temp) for temp in dis_sets)

來源

2014-09-03 ds_user

它看起來像你理解了要求寫。現在嘗試編寫一個解決方案，當你遇到一個你無法解決的問題時，一個你不能解決的問題 - 在這裏發佈。 – alfasin 2014-09-03 02:57:43

我試圖編寫代碼並最終編寫重複的循環，因此需要時間，因此需要尋找更好的替代方案。 – 2014-09-03 02:58:56

嘗試編寫解決問題所需的算法（步驟）。然後爲每個步驟編寫一個函數。在每個步驟單獨運行後，嘗試將這些步驟組合到一個工作流程中：按順序從主函數調用每個步驟。試圖將所有登錄「吞下」到一個巨大的函數/邏輯中對於閱讀，調試，測試和維護都是不利的（正如你已經經歷的那樣）。 – alfasin 2014-09-03 03:02:26

考慮：

>>> tuple1=('H', 'C', 'G') 
>>> tuple2=('H', 'M', 'G') 
>>> tuple3=('L', 'M', 'S')

OK，你的國家，「我的要求是比較與其他記錄每一個記錄，並給予其不同的元素的屬性名稱。」

它放入代碼：

>>> [i for i, t in enumerate(zip(tuple1, tuple2), 1) if t[0]!=t[1]] 
[2] 
>>> [i for i, t in enumerate(zip(tuple1, tuple3), 1) if t[0]!=t[1]] 
[1, 2, 3] 
>>> [i for i, t in enumerate(zip(tuple2, tuple3), 1) if t[0]!=t[1]] 
[1, 3]

那麼你的狀態「，最終輸出應爲{{Col2},{Col1,Col2,Col3},{Col1,Col3}}

因爲一套套將失去秩序，這是沒有意義的。它應該是：

>>> [[i for i, t in enumerate(zip(*pair), 1) if t[0]!=t[1]] for pair in 
...  [(tuple1, tuple2), (tuple1, tuple3), (tuple2, tuple3)]] 
[[2], [1, 2, 3], [1, 3]]

如果你真的想套，你可以讓他們的子元素;如果你有一套真正的套件，你就失去了哪對的信息。

套清單：

>>> [{i for i, t in enumerate(zip(*pair), 1) if t[0]!=t[1]} for pair in 
...  [(tuple1, tuple2), (tuple1, tuple3), (tuple2, tuple3)]] 
[set([2]), set([1, 2, 3]), set([1, 3])]

而且你幾乎相同的期望輸出：

>>> [{'Col{}'.format(i) for i, t in enumerate(zip(*pair), 1) if t[0]!=t[1]} for pair in 
...  [(tuple1, tuple2), (tuple1, tuple3), (tuple2, tuple3)]] 
[set(['Col2']), set(['Col2', 'Col3', 'Col1']), set(['Col3', 'Col1'])]

（注意，由於集合是無序的，的字符串順序有所不同。如果頂層。訂單變更，你有什麼？）

注意，如果你有一個列表的列表，你是更接近你想要的輸出：

>>> [['Col{}'.format(i) for i, t in enumerate(zip(*pair), 1) if t[0]!=t[1]] for pair 
...  in [(tuple1, tuple2), (tuple1, tuple3), (tuple2, tuple3)]] 
[['Col2'], ['Col1', 'Col2', 'Col3'], ['Col1', 'Col3']]

編輯基於評論

你可以做類似的東西：

def pairs(LoT): 
        # for production code, consider using a deque of tuples... 
    seen=set()  # hold the pair combinations seen 
    while LoT: 
     f=LoT.pop(0) 
     for e in LoT: 
      se=frozenset([f, e]) 
      if se not in seen: 
       seen.add(se) 
       yield se 

>>> list(pairs([('H', 'C', 'G'), ('H', 'M', 'G'), ('L', 'M', 'S')])) 
[frozenset([('H', 'M', 'G'), ('H', 'C', 'G')]), frozenset([('L', 'M', 'S'), ('H', 'C', 'G')]), frozenset([('H', 'M', 'G'), ('L', 'M', 'S')])]

然後可以這樣使用：

>>> LoT=[('H', 'C', 'G'), ('H', 'M', 'G'), ('L', 'M', 'S')] 
>>> [['Col{}'.format(i) for i, t in enumerate(zip(*pair), 1) if t[0]!=t[1]] for pair 
...  in pairs(LoT)] 
[['Col2'], ['Col1', 'Col2', 'Col3'], ['Col1', 'Col3']]

編輯＃2

如果你想有一個頭VS的計算值：

>>> theader=['tuple col 1', 'col 2', 'the third' ] 
>>> [[theader[i] for i, t in enumerate(zip(*pair)) if t[0]!=t[1]] for pair 
...  in pairs(LoT)] 
[['col 2'], ['tuple col 1', 'col 2', 'the third'], ['tuple col 1', 'the third']]

如果你想（我懷疑的右答案）清單列表：

>>> di=[] 
>>> for pair in pairs(LoT):  
... di.append({repr(list(pair)): [theader[i] for i, t in enumerate(zip(*pair)) if t[0]!=t[1]]}) 
>>> di 
[{"[('H', 'M', 'G'), ('H', 'C', 'G')]": ['col 2']}, {"[('L', 'M', 'S'), ('H', 'C', 'G')]": ['tuple col 1', 'col 2', 'the third']}, {"[('H', 'M', 'G'), ('L', 'M', 'S')]": ['tuple col 1', 'the third']}]

或者，只是列出的直快譯通：

>>> di={} 
>>> for pair in pairs(LoT):  
... di[repr(list(pair))]=[theader[i] for i, t in enumerate(zip(*pair)) if t[0]!=t[1]] 
>>> di 
{"[('H', 'M', 'G'), ('L', 'M', 'S')]": ['tuple col 1', 'the third'], "[('L', 'M', 'S'), ('H', 'C', 'G')]": ['tuple col 1', 'col 2', 'the third'], "[('H', 'M', 'G'), ('H', 'C', 'G')]": ['col 2']}

來源

2014-09-03 03:31:35 dawg

嘿。感謝您的回覆。但是我需要比較每個元組與其他所有元組，我完全有50k個元組。所以它的字面上不可能像[（tuple1，tuple2），（tuple1，tuple3），（tuple2，tuple3））給出。我不需要返回不同的元素作爲輸出集，我需要返回不同元素的列名作爲設置，所以我想使用不同元素的索引，並從屬性列表中獲取列（屬性）在這裏是list_attr）。你能幫忙利用這個嗎？ – 2014-09-03 03:59:23

嗨。再次感謝。這看起來很完美。但我需要使用列名稱而不是列表中的值。數據集的第一行將具有列名稱。例如，如果（H，C，G）的列名是員工，部門，經理。然後，我需要返回相同的。 – 2014-09-03 04:29:37

我想你可以從這裏拿走它。 ;-)如果您有問題，請提出一個新問題。 'zip'是你的朋友 – dawg 2014-09-03 04:31:06

在元組中按元素比較返回集的集合 - python

回答

相關問題