與eumiro Delete duplicate rows in textfile - except it contains a "{" or "}" 的幫助下刪除文本文件重複字的組合,我可以成功地刪除重複的線路在一個大文本文件。這是從60MB到3MB文本文件的一大步。與蟒蛇
但現在我想刪除重複的話是這樣的:
@INBOOK{Miller1992,
author = {Miller, Rowland S. und Mark R. Leary and Miller, Rowland S. und Mark
R. Leary and Miller, Rowland S. und Mark R. Leary and Miller, Rowland
S. und Mark R. Leary and Miller, Rowland S. und Mark R. Leary and
Miller, Rowland S. und Mark R. Leary and Miller, Rowland S. und Mark
Miller, Rowland S. und Mark R. Leary},
year = {1992},
editor = {Teun A. van Dijk and Teun A. van Dijk and Teun A. van Dijk and Teun
A. van Dijk and Teun A. van Dijk and Teun A. van Dijk and Teun A.
van Dijk and Teun A. van Dijk and Teun A. van Dijk and Teun A. van
Dijk and Teun A. van Dijk and Teun A. van Dijk and Teun A. van Dijk
and Teun A. van Dijk and Teun A. van Dijk and Teun A. van Dijk and
Teun A. van Dijk and Teun A. van Dijk and Teun A. van Dijk and Teun
and Teun A. van Dijk and Teun A. van Dijk and Teun A. van Dijk},
title = {Handbook of discourse analysis (Bd. 3/4)},
的結果應該是這樣的:
@INBOOK{Miller1992,
author = {Miller, Rowland S. und Mark R. Leary},
year = {1992},
editor = {Teun A. van Dijk},
title = {Handbook of discourse analysis (Bd. 3/4)},
文本文件有70000行和authornames可以在多個項目中使用。所以也就只有在大括號中的重複(多行)應刪除:
author = {Miller, Rowland S. und Mark R. Leary and Miller, Rowland S. und Mark
R. Leary and Miller, Rowland S. und Mark R. Leary and Miller, Rowland
S. und Mark R. Leary and Miller, Rowland S. und Mark R. Leary and
Miller, Rowland S. und Mark R. Leary and Miller, Rowland S. und Mark
Miller, Rowland S. und Mark R. Leary},
我想修改我的Python-Skript其刪除重複行的大括號刪除重複的話,但我stucked:
words_seen = set() # holds words already seen
outfile = open("literatur_clean.txt", "w")
for line in open("literatur_dupl.txt", "r"):
if ('{' in line or '}' in line
# some code to check whether the words are duplicate
outfile.close()
感謝您的回答,第一個方法似乎不太適合,但我會嘗試第二種方法。 – StandardNerd