python從字符串列表中刪除元素

-1

我想從字符串列表中刪除元素（從文件讀取）。這些元素本身就是一個列表（以逗號分隔的字符串形式）。python從字符串列表中刪除元素

我想從列表中刪除具有相同元素的字符串。對於例如：

1：GGSIPU，RANK，BTECH，9

2：GGSIPU，BTECH，RANK，9

3：GGSIPU，BTECH，RANK，9

因此線2和3應該被刪除。

這裏是我的代碼：

# to remove duplicates 

with open('itemset3.txt', 'r') as f: 
    lines = f.readlines() 
    f.close() 

i = 0 

while (i<len(lines)): 
    j = i + 1 
    temp = [] 
    temp1 = lines[i].split(',') 
    print 'outer %d %s' % (i,temp1) 
    temp.append(temp1[0]) 
    temp.append(temp1[1]) 
    temp.append(temp1[2]) 
    while (j<len(lines)): 
     if all(t in lines[j] for t in temp): 
      print temp, ' found at ',j,': ',lines[j] 
      # lines.remove(lines[j]) 
      del lines[j] 
     j = j + 1 
    i = i + 1 

f = open('itemset3.txt', 'w') 
i = 0 
while (i<len(lines)): 
    f.write(lines[i]) 
    i = i + 1 
f.close()

，這裏是文本文件

GGSIPU,RANK,BTECH,9 
GGSIPU,BTECH,RANK,9 
GGSIPU,BTECH,RANK,9 
GGSIPU,SEMESTER,RANK,9 
GGSIPU,CALCULATOR,RANK,9 
GGSIPU,CHECK,RANK,7 
GGSIPU,Certified,RANK,7 
GGSIPU,Winner,RANK,7 
GGSIPU,Application,RANK,7 
GGSIPU,Techexpo2015,RANK,7 
GGSIPU,Students,RANK,6 
RANK,BTECH,GGSIPU,9 
RANK,BTECH,GGSIPU,9 
RANK,BTECH,GGSIPU,9 
RANK,SEMESTER,GGSIPU,9 
RANK,SEMESTER,GGSIPU,9 
RANK,CALCULATOR,GGSIPU,9 
RANK,CALCULATOR,GGSIPU,9 
RANK,CHECK,GGSIPU,7 
RANK,CHECK,GGSIPU,7 
RANK,Certified,GGSIPU,7 
RANK,Certified,GGSIPU,7 
RANK,Winner,GGSIPU,7 
RANK,Winner,GGSIPU,7 
RANK,Application,GGSIPU,7 
RANK,Application,GGSIPU,7 
RANK,Techexpo2015,GGSIPU,7 
RANK,Techexpo2015,GGSIPU,7 
RANK,Students,GGSIPU,6 
RANK,Students,GGSIPU,6 
BTECH,SEMESTER,GGSIPU,9 
BTECH,CALCULATOR,GGSIPU,9 
SEMESTER,CALCULATOR,GGSIPU,9 
CHECK,Certified,GGSIPU,7 
CHECK,Winner,GGSIPU,7 
CHECK,Application,GGSIPU,7 
CHECK,Techexpo2015,GGSIPU,7 
CHECK,Students,GGSIPU,6 
Certified,Winner,GGSIPU,7 
Certified,Application,GGSIPU,7 
Certified,Techexpo2015,GGSIPU,7 
Certified,Students,GGSIPU,6 
Winner,Application,GGSIPU,7 
Winner,Techexpo2015,GGSIPU,7 
Winner,Students,GGSIPU,6 
Application,Techexpo2015,GGSIPU,7 
Application,Students,GGSIPU,6 
Techexpo2015,Students,GGSIPU,6

的問題是，在運行代碼後，仍有輸出一些多餘的（重複）線。我應該如何糾正它？

這裏是在做出元組的輸出中：

('Certified', 'Winner', 'GGSIPU', '7') 
('RANK', 'Application', 'GGSIPU', '7') 
('Techexpo2015', 'Students', 'GGSIPU', '6') 
('CHECK', 'Certified', 'GGSIPU', '7') 
('RANK', 'SEMESTER', 'GGSIPU', '9') 
('Application', 'Techexpo2015', 'GGSIPU', '7') 
('GGSIPU', 'SEMESTER', 'RANK', '9') 
('CHECK', 'Techexpo2015', 'GGSIPU', '7') 
('RANK', 'Winner', 'GGSIPU', '7') 
('CHECK', 'Winner', 'GGSIPU', '7') 
('Winner', 'Students', 'GGSIPU', '6') 
('GGSIPU', 'Winner', 'RANK', '7') 
('GGSIPU', 'BTECH', 'RANK', '9') 
('RANK', 'Techexpo2015', 'GGSIPU', '7') 
('Certified', 'Students', 'GGSIPU', '6') 
('GGSIPU', 'CHECK', 'RANK', '7') 
('RANK', 'BTECH', 'GGSIPU', '9') 
('GGSIPU', 'Students', 'RANK', '6') 
('RANK', 'CALCULATOR', 'GGSIPU', '9') 
('Winner', 'Techexpo2015', 'GGSIPU', '7') 
('GGSIPU', 'Certified', 'RANK', '7') 
('RANK', 'CHECK', 'GGSIPU', '7') 
('CHECK', 'Application', 'GGSIPU', '7') 
('RANK', 'Certified', 'GGSIPU', '7') 
('GGSIPU', 'RANK', 'BTECH', '9') 
('GGSIPU', 'CALCULATOR', 'RANK', '9') 
('CHECK', 'Students', 'GGSIPU', '6') 
('GGSIPU', 'Application', 'RANK', '7') 
('GGSIPU', 'Techexpo2015', 'RANK', '7') 
('Winner', 'Application', 'GGSIPU', '7') 
('BTECH', 'SEMESTER', 'GGSIPU', '9') 
('Certified', 'Techexpo2015', 'GGSIPU', '7') 
('RANK', 'Students', 'GGSIPU', '6') 
('SEMESTER', 'CALCULATOR', 'GGSIPU', '9') 
('Certified', 'Application', 'GGSIPU', '7') 
('Application', 'Students', 'GGSIPU', '6') 
('BTECH', 'CALCULATOR', 'GGSIPU', '9')

行如下面仍然存在

1：（ 'GGSIPU'， '應用'， 'RANK'， '7'）

2：（ 'RANK'， '應用'， 'GGSIPU'， '7'）

來源

2015-10-29 TheLinuxEvangelist

我看到一個問題陳述，代碼樣本，和輸入樣本，但毫無疑問的。 –

@ Two-BitAlchemist'我想從列表中刪除那些具有相同元素的字符串' –

打開文件時使用'with'的全部要點是上下文管理器會爲您關閉文件。 – chepner

-1

coverting lines into tuples a making sets. 

allLines = set() 

with open('data') as f: 
    for line in f: 
     line = line.strip() 
     line = tuple(line.split(',')) 
     allLines.add(line) 

pp(allLines) 



{('Application', 'Students', 'GGSIPU', '6'), 
('Application', 'Techexpo2015', 'GGSIPU', '7'), 
('BTECH', 'CALCULATOR', 'GGSIPU', '9'), 
('BTECH', 'SEMESTER', 'GGSIPU', '9'), 
('CHECK', 'Application', 'GGSIPU', '7'), 
('CHECK', 'Certified', 'GGSIPU', '7'), 
('CHECK', 'Students', 'GGSIPU', '6'), 
('CHECK', 'Techexpo2015', 'GGSIPU', '7'), 
('CHECK', 'Winner', 'GGSIPU', '7'), 
('Certified', 'Application', 'GGSIPU', '7'), 
('Certified', 'Students', 'GGSIPU', '6'), 
('Certified', 'Techexpo2015', 'GGSIPU', '7'), 
('Certified', 'Winner', 'GGSIPU', '7'), 
('GGSIPU', 'Application', 'RANK', '7'), 
('GGSIPU', 'BTECH', 'RANK', '9'), 
('GGSIPU', 'CALCULATOR', 'RANK', '9'), 
('GGSIPU', 'CHECK', 'RANK', '7'), 
('GGSIPU', 'Certified', 'RANK', '7'), 
('GGSIPU', 'RANK', 'BTECH', '9'), 
('GGSIPU', 'SEMESTER', 'RANK', '9'), 
('GGSIPU', 'Students', 'RANK', '6'), 
('GGSIPU', 'Techexpo2015', 'RANK', '7'), 
('GGSIPU', 'Winner', 'RANK', '7'), 
('RANK', 'Application', 'GGSIPU', '7'), 
('RANK', 'BTECH', 'GGSIPU', '9'), 
('RANK', 'CALCULATOR', 'GGSIPU', '9'), 
('RANK', 'CHECK', 'GGSIPU', '7'), 
('RANK', 'Certified', 'GGSIPU', '7'), 
('RANK', 'SEMESTER', 'GGSIPU', '9'), 
('RANK', 'Students', 'GGSIPU', '6'), 
('RANK', 'Techexpo2015', 'GGSIPU', '7'), 
('RANK', 'Winner', 'GGSIPU', '7'), 
('SEMESTER', 'CALCULATOR', 'GGSIPU', '9'), 
('Techexpo2015', 'Students', 'GGSIPU', '6'), 
('Winner', 'Application', 'GGSIPU', '7'), 
('Winner', 'Students', 'GGSIPU', '6'), 
('Winner', 'Techexpo2015', 'GGSIPU', '7')}

來源

2015-10-29 15:51:33 LetzerWille

with open('C:\Users\DELL\Documents\itemset3.txt', 'r') as f: 
    lines = f.readlines() 
    f.close() 

linesUp = [] 
for line in lines: 
    linesUp.append(tuple(line.replace("\n","").split(','))) 

setOfLines = set(linesUp)

我已經從,分割的字符串構造了元組，並將它們放入列表中。然後結束創建一個只消除重複的集合。

使用替換字符串line因爲幾個字符串沒有新的線路出奇你的數據。

我有一個小的數據集的工作。希望它會爲你工作

來源

2015-10-29 16:08:43 saikumarm

這會解決我的問題嗎？ – TheLinuxEvangelist

是的，它確實解決了你的問題 – saikumarm

它沒有解決問題..仍然有重複的元組.. – TheLinuxEvangelist

python從字符串列表中刪除元素

回答

相關問題