從文本文件中刪除這兩個重複項（不只是重複項）？

-1

通過這個我的意思是，擦除重複的文本文件中的所有行，而不僅僅是重複。我的意思是重複的行和重複的行。這會讓我只留下沒有重複的行列表。也許正則表達式可以在記事本++中做到這一點？但是哪一個？任何其他方法？從文本文件中刪除這兩個重複項（不只是重複項）？

來源

2011-03-11 user656022

您是否安裝了任何編程語言？最好爲這樣的任務「腳本化」語言。如果是這樣，哪種語言。那些是你的首選語言？ –

如果您使用的是類似unix的系統，則可以使用uniq命令。

[email protected]:~$ cat test.file 
ezra 
ezra 
john 
user 
[email protected]:~$ uniq -u test.file 
john 
user

請注意，類似的行是相鄰的。如果不是，你必須先排序文件。

[email protected]:~$ cat test.file 
ezra 
john 
ezra 
user 
[email protected]:~$ uniq -u test.file 
ezra 
john 
ezra 
user 
[email protected]:~$ sort test.file | uniq -u 
john 
user

來源

2011-03-11 21:47:47 Ezra

如果你不是，我建議cygwin。 – zebediah49

GnuWin也很棒。 – Ezra

如果你有接取到支持PCRE風格的正則表達式，這是直截了當：

s/(?:^|(?<=\n))(.*)\n(?:\1(?:\n|$))+//g

(?:^|(?<=\n))  # Behind us is beginning of string or newline 
(.*)\n   # Capture group 1: all characters up until next newline 
(?:    # Start non-capture group 
    \1    # backreference to what was captured in group 1 
    (?:\n|$)   # a newline or end of string 
)+    # End non-capture group, do this 1 or more times

上下文是一個字符串

use strict; use warnings; 

my $str = 
'hello 
this is 
this is 
this is 
that is'; 

$str =~ s/ 
      (?:^|(?<=\n)) 
      (.*)\n 
      (?: 
       \1 
       (?:\n|$) 
     )+ 
    //xg; 

print "'$str'\n"; 

__END__

輸出：

'hello
that is'

來源

2011-03-12 00:04:05 sln

如果'm''multi-line'模式開啓，我認爲'（？：^ |（？<= \ n））'這個表達式可以簡化爲：'^'。 – ridgerunner

嘿感謝哥們。你推薦什麼正則表達式編輯器？我有編輯器親，正則表達式好友，和記事本++。另外，如何在這些編輯器中獲得這種表達式輸入和輸出;我在三人中寫下了他們，但我顯然不知道自己在做什麼。一些能夠實現分屏的東西會很棒。（有點像dreamweaver） – user656022

@ridgerunner - 是的，它可以使用只是'/^..mg'，但有時解釋多線模式往往是一個頭痛的問題。 – sln

從文本文件中刪除這兩個重複項（不只是重複項）？

回答

相關問題