我想從字符串列表中刪除元素(從文件讀取)。這些元素本身就是一個列表(以逗號分隔的字符串形式)。python從字符串列表中刪除元素
我想從列表中刪除具有相同元素的字符串。 對於例如:
1:GGSIPU,RANK,BTECH,9
2:GGSIPU,BTECH,RANK,9
3:GGSIPU,BTECH,RANK,9
因此線2和3應該被刪除。
這裏是我的代碼:
# to remove duplicates
with open('itemset3.txt', 'r') as f:
lines = f.readlines()
f.close()
i = 0
while (i<len(lines)):
j = i + 1
temp = []
temp1 = lines[i].split(',')
print 'outer %d %s' % (i,temp1)
temp.append(temp1[0])
temp.append(temp1[1])
temp.append(temp1[2])
while (j<len(lines)):
if all(t in lines[j] for t in temp):
print temp, ' found at ',j,': ',lines[j]
# lines.remove(lines[j])
del lines[j]
j = j + 1
i = i + 1
f = open('itemset3.txt', 'w')
i = 0
while (i<len(lines)):
f.write(lines[i])
i = i + 1
f.close()
,這裏是文本文件
GGSIPU,RANK,BTECH,9
GGSIPU,BTECH,RANK,9
GGSIPU,BTECH,RANK,9
GGSIPU,SEMESTER,RANK,9
GGSIPU,CALCULATOR,RANK,9
GGSIPU,CHECK,RANK,7
GGSIPU,Certified,RANK,7
GGSIPU,Winner,RANK,7
GGSIPU,Application,RANK,7
GGSIPU,Techexpo2015,RANK,7
GGSIPU,Students,RANK,6
RANK,BTECH,GGSIPU,9
RANK,BTECH,GGSIPU,9
RANK,BTECH,GGSIPU,9
RANK,SEMESTER,GGSIPU,9
RANK,SEMESTER,GGSIPU,9
RANK,CALCULATOR,GGSIPU,9
RANK,CALCULATOR,GGSIPU,9
RANK,CHECK,GGSIPU,7
RANK,CHECK,GGSIPU,7
RANK,Certified,GGSIPU,7
RANK,Certified,GGSIPU,7
RANK,Winner,GGSIPU,7
RANK,Winner,GGSIPU,7
RANK,Application,GGSIPU,7
RANK,Application,GGSIPU,7
RANK,Techexpo2015,GGSIPU,7
RANK,Techexpo2015,GGSIPU,7
RANK,Students,GGSIPU,6
RANK,Students,GGSIPU,6
BTECH,SEMESTER,GGSIPU,9
BTECH,CALCULATOR,GGSIPU,9
SEMESTER,CALCULATOR,GGSIPU,9
CHECK,Certified,GGSIPU,7
CHECK,Winner,GGSIPU,7
CHECK,Application,GGSIPU,7
CHECK,Techexpo2015,GGSIPU,7
CHECK,Students,GGSIPU,6
Certified,Winner,GGSIPU,7
Certified,Application,GGSIPU,7
Certified,Techexpo2015,GGSIPU,7
Certified,Students,GGSIPU,6
Winner,Application,GGSIPU,7
Winner,Techexpo2015,GGSIPU,7
Winner,Students,GGSIPU,6
Application,Techexpo2015,GGSIPU,7
Application,Students,GGSIPU,6
Techexpo2015,Students,GGSIPU,6
的問題是,在運行代碼後,仍有輸出一些多餘的(重複)線。我應該如何糾正它?
這裏是在做出元組的輸出中:
('Certified', 'Winner', 'GGSIPU', '7')
('RANK', 'Application', 'GGSIPU', '7')
('Techexpo2015', 'Students', 'GGSIPU', '6')
('CHECK', 'Certified', 'GGSIPU', '7')
('RANK', 'SEMESTER', 'GGSIPU', '9')
('Application', 'Techexpo2015', 'GGSIPU', '7')
('GGSIPU', 'SEMESTER', 'RANK', '9')
('CHECK', 'Techexpo2015', 'GGSIPU', '7')
('RANK', 'Winner', 'GGSIPU', '7')
('CHECK', 'Winner', 'GGSIPU', '7')
('Winner', 'Students', 'GGSIPU', '6')
('GGSIPU', 'Winner', 'RANK', '7')
('GGSIPU', 'BTECH', 'RANK', '9')
('RANK', 'Techexpo2015', 'GGSIPU', '7')
('Certified', 'Students', 'GGSIPU', '6')
('GGSIPU', 'CHECK', 'RANK', '7')
('RANK', 'BTECH', 'GGSIPU', '9')
('GGSIPU', 'Students', 'RANK', '6')
('RANK', 'CALCULATOR', 'GGSIPU', '9')
('Winner', 'Techexpo2015', 'GGSIPU', '7')
('GGSIPU', 'Certified', 'RANK', '7')
('RANK', 'CHECK', 'GGSIPU', '7')
('CHECK', 'Application', 'GGSIPU', '7')
('RANK', 'Certified', 'GGSIPU', '7')
('GGSIPU', 'RANK', 'BTECH', '9')
('GGSIPU', 'CALCULATOR', 'RANK', '9')
('CHECK', 'Students', 'GGSIPU', '6')
('GGSIPU', 'Application', 'RANK', '7')
('GGSIPU', 'Techexpo2015', 'RANK', '7')
('Winner', 'Application', 'GGSIPU', '7')
('BTECH', 'SEMESTER', 'GGSIPU', '9')
('Certified', 'Techexpo2015', 'GGSIPU', '7')
('RANK', 'Students', 'GGSIPU', '6')
('SEMESTER', 'CALCULATOR', 'GGSIPU', '9')
('Certified', 'Application', 'GGSIPU', '7')
('Application', 'Students', 'GGSIPU', '6')
('BTECH', 'CALCULATOR', 'GGSIPU', '9')
行如下面仍然存在
1:( 'GGSIPU', '應用', 'RANK', '7')
2:( 'RANK', '應用', 'GGSIPU', '7')
我看到一個問題陳述,代碼樣本,和輸入樣本,但毫無疑問的。 –
@ Two-BitAlchemist'我想從列表中刪除那些具有相同元素的字符串' –
打開文件時使用'with'的全部要點是上下文管理器會爲您關閉文件。 – chepner