如何找到彼此相鄰的python列表中的重複項，並列出它們的索引？

我有一個讀取.csv文件的程序，檢查列長度中是否有任何不匹配（通過將它與頭字段進行比較），然後將它發現的所有內容作爲列表返回（然後將其寫入文件）。我想這個名單的事，是列出瞭如下結果：在相同的發現不匹配如何找到彼此相鄰的python列表中的重複項，並列出它們的索引？

行號：列的金額該行中

例如

rows: n-m : y

其中n和m是共享與標題不匹配的相同數量的列的行數。

我特地到這些話題，雖然信息是非常有用的，他們不回答這個問題：

Find and list duplicates in a list?

Identify duplicate values in a list in Python

這就是我現在：

r = csv.reader(data, delimiter= '\t') 
columns = [] 
for row in r: 
     # adds column length to a list 
     colm = len(row) 
     columns.append(colm) 

b = len(columns) 
for a in range(b): 
     # checks if the current member matches the header length of columns 
     if columns[a] != columns[0]: 
       # if it doesnt, write the row and the amount of columns in that row to a file 
       file.write("row " + str(a + 1) + ": " + str(columns[a]) + " \n")

文件輸出如下所示：

row 7220: 0 
row 7221: 0 
row 7222: 0 
row 7223: 0 
row 7224: 0 
row 7225: 1 
row 7226: 1

時所期望的最終結果是

rows 7220 - 7224 : 0 
rows 7225 - 7226 : 1

所以我什麼，我基本上是需要的，我看到它的方式，是一個字典，其中關鍵是有重複的值和值的行被列的量那說錯配。我基本上是想我需要（在一個可怕的書面僞代碼，這並沒有任何意義，現在我寫這個問題後看完年），是在這裏：

def pseudoList(): 
    i = 1 
    ListOfLists = [] 
    while (i < len(originalList)): 
     duplicateList = [] 
     if originalList[i] == originalList[i-1]: 
      duplicateList.append(originalList[i]) 
     i += 1 
    ListOfLists.append(duplicateList) 


def PseudocreateDict(ListOfLists): 
    pseudoDict = {} 
    for x in ListOfLists: 
     a = ListOfLists[x][0]     #this is the first node in the uniqueList created 
     i = len(ListOfLists) - 1 
     b = listOfLists[x][i] #this is the last node of the uniqueList created 
     pseudodict.update('key' : '{} - {}'.format(a,b))

然而，這似乎很令人費解做我想要的方式，所以我想知道是否有更有效的方法b）更簡單的方法來做到這一點？

來源

2015-06-25 Christian W.

您可以使用列表解析返回的列清單，從相鄰的元素不同的元素的列表，這將是結束點的範圍。然後枚舉這些範圍並打印/寫出與第一個（標題）元素不同的那些範圍。將一個額外元素添加到範圍列表中以指定列表的結束索引，以避免超出範圍索引。

columns = [2, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 2, 1]; 

ranges = [[i+1, v] for i,v in enumerate(columns[1:]) if columns[i] != columns[i+1]] 
ranges.append([len(columns),0]) # special case for last element 
for i,v in enumerate(ranges[:-1]): 
    if v[1] != columns[0]: 
     print "rows", v[0]+1, "-", ranges[i+1][0], ":", v[1]

輸出：

rows 2 - 5 : 1 
rows 6 - 9 : 0 
rows 10 - 11 : 1 
rows 13 - 13 : 1

來源

2015-06-25 10:58:00 samgak

你想要做的是map/reduce操作，但沒有通常在映射和還原之間進行排序。

如果輸出

row 7220: 0 
row 7221: 0 
row 7222: 0 
row 7223: 0

到stdout，你可以管這個數據來生成你想要的組另一Python程序。

第二Python程序可能是這個樣子：

import sys 
import re 


line = sys.stdin.readline() 
last_rowid, last_diff = re.findall('(\d+)', line) 

for line in sys.stdin: 
    rowid, diff = re.findall('(\d+)', line) 
    if diff != last_diff: 
     print "rows", last_rowid, rowid, last_diff 
     last_diff = diff 
     last_rowid = rowid 

print "rows", last_rowid, rowid, last_diff

你會在UNIX環境中執行它們像這樣得到的輸出到一個文件：

python yourprogram.py | python myprogram.py > youroutputfile.dat

如果您不能運行這在unix環境下，您仍然可以使用我在程序中編寫的算法進行一些修改。

來源

2015-06-25 10:42:51 firelynx

您還可以試試下面的代碼 -

b = len(columns) 
check = 0 
for a in range(b): 
     # checks if the current member matches the header length of columns 
     if check != 0 and columns[a] == check: 
      continue 
     elif check != 0 and columns[a] != check: 
      check = 0 
      if start != a: 
       file.write("row " + str(start) + " - " + str(a) + ": " + str(columns[a]) + " \n") 
      else: 
       file.write("row " + str(start) + ": " + str(columns[a]) + " \n") 
     if columns[a] != columns[0]: 
       # if it doesnt, write the row and the amount of columns in that row to a file 
       start = a+1 
       check = columns[a]

來源

2015-06-25 10:57:10

如何找到彼此相鄰的python列表中的重複項，並列出它們的索引？

回答

相關問題