python比較兩個大列表並處理一個

我有兩個文本文件。在A.TXT，有線像這樣（一個百萬行）：python比較兩個大列表並處理一個

991000000019999998,b10000021, 
991000000019703408,b10000021, 
991000545455435408,b10000045, 
991000000029703408,b10000045, 
...

第一部分是條形碼（991000000019703408），該第二部分是bib_number（b10000021）。請注意，bib_number可能會在每行中重複。但條碼是唯一的。所以使用Set（）我認爲不好。在另一個文件b.txt，該信息只有約bib_number（60萬記錄）：

現在我要比較兩個文件，在a.txt中，如果eachline的bib_number（如b10000045）不在b.txt，這整行需要輸出到c.txt，像（991000000029703408，b10000045，）

我寫這樣的代碼，但我沒有得到reuslt，直到20分鐘。

with open("a.txt", "r") as f1,open("b.txt", "r") as f2,open("c.txt","w") as f3: 
    total_bb=f1.readlines() 
    list_match=f2.readlines() 
    for item_bb in total_bb: 
     recordList=re.split(",",item_bb) 
     item_bb_w=(recordList[1])+'\n' 
     if item_bb_w not in list_match: 
      f3.write(item_bb)

是否有任何技巧做這兩個大型列表的比較？由於

來源

2017-03-17 Evan Zhang

你關心訂單嗎？如果不是隻是把它們變成'set's，'set1 - set2'會給你所有缺失的數字。 – AChampion

使用set S，查找是O(1)：

with open("a.txt", "r") as f1,open("b.txt", "r") as f2,open("c.txt","w") as f3: 
    bs = set(b.strip() for b in f2) 
    for a in f1: 
     x = a.split(',') 
     if x[1].strip() not in bs: 
      f3.write(a)

我還要看看csv模塊，以讀取逗號分隔值。

來源

2017-03-17 03:52:20 AChampion

請注意，bib_number可能在每行中重複。但是條形碼是獨一無二的（我編輯過）所以使用Set（）我認爲不好。 SET（）將刪除相同的元素。我需要輸出整行，而不僅僅是bib_number.Thanks –

更新後寫出整個'a'。 – AChampion

@ AChampion should not'if x [1] .strip（）not in b「be」if x [1] .strip（）not in bs「？ – plasmon360

python比較兩個大列表並處理一個

回答

相關問題