如何通過perl或bash刪除重複的行？

1

sed -s 's/@/@\t/g' test.txt | uniq -f 1 | sed -s 's/@\t/@/g'

第一SED在2個字段（名+域名）用製表符分隔的電子郵件，以便uniq的可移除重複的域時，跳過所述第一場，而最後SED移除的標籤。

來源

2012-04-08 19:45:31 alexisdm

3

#!/usr/bin/env perl 

use strict; use warnings; 
use Email::Address; 

my %data; 

while (my $line = <DATA>) { 
    my ($addr) = Email::Address->parse($line =~ /^(\S+)/); 
    push @{ $data{ $addr->host } }, $addr->original; 
} 

for my $addrs (values %data) { 
    if (@$addrs > 2) { 
     print "$addrs->[0]\n"; 
    } 
    else { 
     print "$_\n" for @$addrs; 
    } 
} 

__DATA__ 
[email protected] 
[email protected] 
[email protected] 
[email protected] 
[email protected] 
[email protected] 
[email protected] 
[email protected]

來源

2012-04-08 20:10:48

0

我很疑惑爲什麼你的例子輸出包含[email protected]兩次，但認爲這是一個錯誤。

只要有尾隨空格或電子郵件地址的更復雜的形式沒有問題，你可以用

perl [email protected] -ne 'print unless $seen{$F[1]}++' myfile

輸出這樣做只是在Perl

[email protected] 
[email protected] 
[email protected]

來源

2012-04-08 21:26:22 Borodin

0

這可能會實現爲你：

sed ':a;$!N;s/^\([^@]*@\([^\n]*\)\)\n.*\2/\1/;ta;P;D' file 
[email protected] 
[email protected] 
[email protected]

來源

2012-04-09 00:31:12 potong

0

如果你不「介意的順序，只需要使用排序：

sort -t '@' -u -k 2,2 your_file

如果你很介意的順序，做

gawk '{print NR "@" $0}' your_file | sort -t '@' -u -k 3,3 | sort -t '@' -k 1,1n | cut -d \@ -f 2-

來源

2012-04-09 10:19:22

如何通過perl或bash刪除重複的行？

回答

相關問題