那麼,這裏是我該怎麼做。
首先,閱讀您的文件和創建內容的2D numpy的數組:
import numpy
with open('test.txt', 'r') as fil:
lines = fil.readlines()
lines = [line.strip().split() for line in lines]
arr = numpy.array(lines)
然後,檢查每一行都有使用套副本(一集有沒有重複,因此,如果集的長度比所述陣列的長度的不同,所述陣列具有一式兩份):
for row in arr:
if len(set(row)) != len(row):
print 'Duplicates in row: ', row
然後,檢查如果每個列具有使用集,重複通過轉您numpy的陣列:
for col in arr.T:
if len(set(col)) != len(col):
print 'Duplicates in column: ', col
如果你包這一切的功能:
def check_for_duplicates(filename):
import numpy
with open(filename, 'r') as fil:
lines = fil.readlines()
lines = [line.strip().split() for line in lines]
arr = numpy.array(lines)
for row in arr:
if len(set(row)) != len(row):
print 'Duplicates in row: ', row
for col in arr.T:
if len(set(col)) != len(col):
print 'Duplicates in column: ', col
正如Apero建議,你也可以使用壓縮(https://docs.python.org/3/library/functions.html#zip)這樣做沒有numpy的:
def check_for_duplicates(filename):
with open(filename, 'r') as fil:
lines = fil.readlines()
lines = [line.strip().split() for line in lines]
for row in lines:
if len(set(row)) != len(row):
print 'Duplicates in row: ', row
for col in zip(*lines):
if len(set(col)) != len(col):
print 'Duplicates in column: ', col
在你的榜樣此,代碼打印:
# Duplicates in row: ['A' 'A' 'B']
# Duplicates in column: ['A' 'A' 'B']
山坳= ZIP(*行)就足夠了,沒必要numpy的這裏 –
@Apero你是絕對正確的。我編輯我的答案。謝謝。 –
@JohnPal檢查zip文檔(https://docs.python.org/3/library/functions.html#zip)。 'zip'會將一些給定迭代器的元素聚合成元組。例如,'x = [1,2,3]; y = [4,5,6]; zip(x,y)''返回'[(1,4),(2,5),(3,6)]''。要了解'* lines'的含義,請查看此鏈接(http://agiliq.com/blog/2012/06/understanding-args-and-kwargs/) –