根據字段排序多行

-2

我在這裏有一個記錄多行，我要做的是根據類型和HEADER1中的6位數字對它們進行排序。根據字段排序多行

以下是實錄：

HEADER1|TYPE1|123456|JOHN SMITH 
INFO|M|34|SINGLE 
INFO|SGT 
STATUS|KIA 
MSG|NONE 
HEADER1|TYPE3|654123|DANICA CLYNE 
INFO|F|20|SINGLE 
STATUS|MIA 
MSG|HELP 
MSG1|| 
HEADER1|TYPE2|987456|NIDALEE LANE 
INFO|F|26|MARRIED 
STATUS|INJURED 
MSG|NONE 
HEADER1|TYPE1|123456|JOHN CONNOR 
INFO|M|34|SINGLE 
STATUS|KIA 
MSG|NONE 
HEADER1|TYPE4|123789|CAITLYN MIST 
INFO|F|19|SINGLE 
INFO||| 
STATUS|NONE 
MSG|NONE 
HEADER1|TYPE2|987456|NIDALEE CROSS 
INFO|F|26|MARRIED 
STATUS|INJURED 
MSG|NONE

輸出應該是這樣的：它整理匹配的規則

HEADER1|TYPE1|123456|JOHN SMITH 
INFO|M|34|SINGLE 
INFO|SGT 
STATUS|KIA 
MSG|NONE 
HEADER1|TYPE1|123456|JOHN CONNOR 
INFO|M|34|SINGLE 
STATUS|KIA 
MSG|NONE 
HEADER1|TYPE2|987456|NIDALEE LANE 
INFO|F|26|MARRIED 
STATUS|INJURED 
MSG|NONE 
HEADER1|TYPE2|987456|NIDALEE CROSS 
INFO|F|26|MARRIED 
STATUS|INJURED 
MSG|NONE 
HEADER1|TYPE3|654123|DANICA CLYNE 
INFO|F|20|SINGLE 
STATUS|MIA 
MSG|HELP 
MSG1|| 
HEADER1|TYPE4|123789|CAITLYN MIST 
INFO|F|19|SINGLE 
INFO||| 
STATUS|NONE 
MSG|NONE

來源

2013-07-10 Soncire

你有沒有試過的代碼？ – kjprice

@kjprice我還在制定一個 – Soncire

這些數字是否與這個類型相對應？恩。類型1總是有數字123456，或者你想他們按類型，然後按數字排序？ – chilemagic

這是我的解決方案。

#!/bin/perl 

use warnings; 
use strict; 

# Read in the file 
open(my $fh, '<', "./record.txt") or DIE $!; 
my @lines = <$fh>; 
my @records; 

# populate @records with each element having 4 lines 
for (my $index = 0; $index < scalar @lines; $index+=4) { 
    push @records, join("", ($lines[$index], $lines[$index+1], $lines[$index+2], $lines[$index+3])); 
} 

# sort by type and then by numbers 
@records = map { $_->[0] } 
      sort { $a->[1] cmp $b->[1] || $a->[2] cmp $b->[2] } 
      map { [ $_ , (split('\|', $_))[1], (split('\|', $_))[2] ] } 
      @records; 

print "@records";

這裏有一個更新的版本，同樣的想法：

#!/bin/perl 

use warnings; 
use strict; 


open(my $fh, '<', "./record.txt") or DIE $!; 
my @lines = <$fh>; 
my $temp = join ("", @lines); 
my @records = split("HEADER1", "$temp"); 
my @new_records; 

for my $rec (@records){ 
    push @new_records, "HEADER1" . $rec; 
} 
shift @new_records; 



@records = map { $_->[0] } 
      sort { $a->[1] cmp $b->[1] || $a->[2] cmp $b->[2] } 
      map { [ $_ , (split('\|', $_))[1], (split('\|', $_))[2] ] } 
      @new_records; 



print "@records";

來源

2013-07-10 02:34:46 chilemagic

你只需要改變你的閱讀（閱讀在循環中每2行而不是4行），但排序仍然是一樣的。你是否也想通過MSG進行分類？ – chilemagic

沒有亞光只是通過類型和數字，我只是想讓你知道，記錄並不總是由4行 – Soncire

如果你不關心性能就行了，每個「記錄」由4行組成：

# Assume STDIN since the question didn't say anything 
my $line_index = 0; 
my (@records, @record); 
# Slurp in all records into array of quadruplets 
while (<>) { 
    if (0 == $line_index) { 
     push @records, []; 
    }; 
    $records[-1]->[$line_index] = $_; # -1 lets you access last element of array. 
    $line_index++; 
    $line_index = 0 if $line_index == 4; # better done via "%" 
} 

# Sort the array. Since we sort by type+id, 
# we can simply sort the first strings alphabetically. 
my @records_sorted = sort { $a->[0] cmp $b->[0] } @records; 

foreach my $record (@records_sorted) { 
    print join("", @$record); # Newlines never stripped, no need to append 
}

如果你是喜歡冒險的，使用List :: MoreUtils :: natatime：

use List::MoreUtils q/natatime/; 
my @lines = File::Slurp::read_file("my_file.txt"); 
my $it = natatime 4, @lines; 
my (@records, @record); 
while ((@record) = $it->()) { 
    push @records, \@record; 
} 
my @records_sorted = sort { $a->[0] cmp $b->[0] } @records; 
foreach my $record (@records_sorted) { 
    print join("", @$record); 
}

從@lines創建@records另一種選擇是List::Gen：

use List::Gen qw/by/; 
foreach my $record (by 4 => @lines) { 
    push @records, $record; 
}

請注意，上面的代碼假設所有的＃都是6位數。如果不是這種情況，就需要修改一下代碼：

use List::Gen qw/by/; 
my @lines = File::Slurp::read_file("my_file.txt"); 
my @records; 
foreach my $record (by 4 => @lines) { 
    my @sort_by = split(m#/#, $record->[0]); 
    push @records, [ $record, \@sort_by ]; 
} 
my @records_sorted = sort { 
          $a->[1]->[1] cmp $b->[1]->[1] 
          || $a->[1]->[2] <=> $b->[1]->[1] 
        } @records; 
foreach my $record (@records_sorted) { 
    print join("", @{$record->[0]}); 
}

更新：由於OP決定，輸入文件可能有每個記錄線中的任意＃，這裏是更新後的代碼：

my (@records, @record); 
# Slurp in all records into array of quadruplets 
while (<>) { 
    if (/HEADER1/) { 
     my @sort_by = split(m#/#);    
     push @records, [[], \@sort_by]; 
    }; 
    push @{ $records[-1]->[0] }, $_; 
} 
my @records_sorted = sort { 
          $a->[1]->[1] cmp $b->[1]->[1] 
          || $a->[1]->[2] <=> $b->[1]->[1] 
        } @records; 
foreach my $record (@records_sorted) { 
    print join("", @{$record->[0]}); 
}

來源

2013-07-10 02:27:27 DVK

請注意 - 這不是最優雅或慣用的Perl代碼。我試圖讓它變得更加新手友好。 – DVK

感謝DVK，它不一定總是4行，我該怎麼調整？ – Soncire

@Soncire - 取決於你的文件的樣子。我根據你的問題回答。如果你的文件格式不同，更好發佈作爲一個新的問題，因爲它是不公平的，我發佈了全面的答案後，改變輸入文件:) – DVK

使用列表:: MoreUtils「應用」，並設置input_record_separator到「標頭」，代碼可以像下面。

#!/usr/bin/perl 
use strict; 
use warnings; 
use List::MoreUtils qw/ apply /; 

my $fname = 'dup_data.txt'; 

open (my $input_fh, '<', $fname) or die "Unable to read '$fname' because $!"; 
open (my $OUTPUTA, ">", $fname .".reformat") 
    or die "$0: could not write to '$fname.reformat'. $!"; 

{ 
    local $/ = "HEADER"; 

    print $OUTPUTA map{ "HEADER$_->[0]"} 
        sort {$a->[1] <=> $b->[1] || $a->[2] <=> $b->[2]} 
        map {[$_, /TYPE(\d+)\|(\d+)/]} 
        grep $_, apply {chomp} <$input_fh>; 
} 
close $input_fh or die $!; 
close $OUTPUTA or die $!;

來源

2013-07-10 05:31:09

根據字段排序多行

回答

相關問題