2013-07-10 73 views
-2

我在這裏有一個記錄多行,我要做的是根據類型和HEADER1中的6位數字對它們進行排序。根據字段排序多行

以下是實錄:

HEADER1|TYPE1|123456|JOHN SMITH 
INFO|M|34|SINGLE 
INFO|SGT 
STATUS|KIA 
MSG|NONE 
HEADER1|TYPE3|654123|DANICA CLYNE 
INFO|F|20|SINGLE 
STATUS|MIA 
MSG|HELP 
MSG1|| 
HEADER1|TYPE2|987456|NIDALEE LANE 
INFO|F|26|MARRIED 
STATUS|INJURED 
MSG|NONE 
HEADER1|TYPE1|123456|JOHN CONNOR 
INFO|M|34|SINGLE 
STATUS|KIA 
MSG|NONE 
HEADER1|TYPE4|123789|CAITLYN MIST 
INFO|F|19|SINGLE 
INFO||| 
STATUS|NONE 
MSG|NONE 
HEADER1|TYPE2|987456|NIDALEE CROSS 
INFO|F|26|MARRIED 
STATUS|INJURED 
MSG|NONE 

輸出應該是這樣的: 它整理匹配的規則

HEADER1|TYPE1|123456|JOHN SMITH 
INFO|M|34|SINGLE 
INFO|SGT 
STATUS|KIA 
MSG|NONE 
HEADER1|TYPE1|123456|JOHN CONNOR 
INFO|M|34|SINGLE 
STATUS|KIA 
MSG|NONE 
HEADER1|TYPE2|987456|NIDALEE LANE 
INFO|F|26|MARRIED 
STATUS|INJURED 
MSG|NONE 
HEADER1|TYPE2|987456|NIDALEE CROSS 
INFO|F|26|MARRIED 
STATUS|INJURED 
MSG|NONE 
HEADER1|TYPE3|654123|DANICA CLYNE 
INFO|F|20|SINGLE 
STATUS|MIA 
MSG|HELP 
MSG1|| 
HEADER1|TYPE4|123789|CAITLYN MIST 
INFO|F|19|SINGLE 
INFO||| 
STATUS|NONE 
MSG|NONE 
+0

你有沒有試過的代碼? – kjprice

+0

@kjprice我還在制定一個 – Soncire

+0

這些數字是否與這個類型相對應?恩。類型1總是有數字123456,或者你想他們按類型,然後按數字排序? – chilemagic

回答

2

這是我的解決方案。

#!/bin/perl 

use warnings; 
use strict; 

# Read in the file 
open(my $fh, '<', "./record.txt") or DIE $!; 
my @lines = <$fh>; 
my @records; 

# populate @records with each element having 4 lines 
for (my $index = 0; $index < scalar @lines; $index+=4) { 
    push @records, join("", ($lines[$index], $lines[$index+1], $lines[$index+2], $lines[$index+3])); 
} 

# sort by type and then by numbers 
@records = map { $_->[0] } 
      sort { $a->[1] cmp $b->[1] || $a->[2] cmp $b->[2] } 
      map { [ $_ , (split('\|', $_))[1], (split('\|', $_))[2] ] } 
      @records; 

print "@records"; 

這裏有一個更新的版本,同樣的想法:

#!/bin/perl 

use warnings; 
use strict; 


open(my $fh, '<', "./record.txt") or DIE $!; 
my @lines = <$fh>; 
my $temp = join ("", @lines); 
my @records = split("HEADER1", "$temp"); 
my @new_records; 

for my $rec (@records){ 
    push @new_records, "HEADER1" . $rec; 
} 
shift @new_records; 



@records = map { $_->[0] } 
      sort { $a->[1] cmp $b->[1] || $a->[2] cmp $b->[2] } 
      map { [ $_ , (split('\|', $_))[1], (split('\|', $_))[2] ] } 
      @new_records; 



print "@records"; 
+0

如果信息是兩個線路還是味精? 這樣 頭1 | TYPE1 | 123456 | JOHN SMITH 信息| M | 34 |單 信息| UNEMPLOTED 狀態|起亞 味精| NONE MSG1 | NONE – Soncire

+0

你只需要改變你的閱讀(閱讀在循環中每2行而不是4行),但排序仍然是一樣的。你是否也想通過MSG進行分類? – chilemagic

+0

沒有亞光只是通過類型和數字,我只是想讓你知道,記錄並不總是由4行 – Soncire

2

如果你不關心性能就行了,每個「記錄」由4行組成:

# Assume STDIN since the question didn't say anything 
my $line_index = 0; 
my (@records, @record); 
# Slurp in all records into array of quadruplets 
while (<>) { 
    if (0 == $line_index) { 
     push @records, []; 
    }; 
    $records[-1]->[$line_index] = $_; # -1 lets you access last element of array. 
    $line_index++; 
    $line_index = 0 if $line_index == 4; # better done via "%" 
} 

# Sort the array. Since we sort by type+id, 
# we can simply sort the first strings alphabetically. 
my @records_sorted = sort { $a->[0] cmp $b->[0] } @records; 

foreach my $record (@records_sorted) { 
    print join("", @$record); # Newlines never stripped, no need to append 
} 

如果你是喜歡冒險的,使用List :: MoreUtils :: natatime:

use List::MoreUtils q/natatime/; 
my @lines = File::Slurp::read_file("my_file.txt"); 
my $it = natatime 4, @lines; 
my (@records, @record); 
while ((@record) = $it->()) { 
    push @records, \@record; 
} 
my @records_sorted = sort { $a->[0] cmp $b->[0] } @records; 
foreach my $record (@records_sorted) { 
    print join("", @$record); 
} 

從@lines創建@records另一種選擇是List::Gen

use List::Gen qw/by/; 
foreach my $record (by 4 => @lines) { 
    push @records, $record; 
} 

請注意,上面的代碼假設所有的#都是6位數。如果不是這種情況,就需要修改一下代碼:

use List::Gen qw/by/; 
my @lines = File::Slurp::read_file("my_file.txt"); 
my @records; 
foreach my $record (by 4 => @lines) { 
    my @sort_by = split(m#/#, $record->[0]); 
    push @records, [ $record, \@sort_by ]; 
} 
my @records_sorted = sort { 
          $a->[1]->[1] cmp $b->[1]->[1] 
          || $a->[1]->[2] <=> $b->[1]->[1] 
        } @records; 
foreach my $record (@records_sorted) { 
    print join("", @{$record->[0]}); 
} 

更新:由於OP決定,輸入文件可能有每個記錄線中的任意#,這裏是更新後的代碼:

my (@records, @record); 
# Slurp in all records into array of quadruplets 
while (<>) { 
    if (/HEADER1/) { 
     my @sort_by = split(m#/#);    
     push @records, [[], \@sort_by]; 
    }; 
    push @{ $records[-1]->[0] }, $_; 
} 
my @records_sorted = sort { 
          $a->[1]->[1] cmp $b->[1]->[1] 
          || $a->[1]->[2] <=> $b->[1]->[1] 
        } @records; 
foreach my $record (@records_sorted) { 
    print join("", @{$record->[0]}); 
} 
+0

請注意 - 這不是最優雅或慣用的Perl代碼。我試圖讓它變得更加新手友好。 – DVK

+0

感謝DVK,它不一定總是4行,我該怎麼調整? – Soncire

+0

@Soncire - 取決於你的文件的樣子。我根據你的問題回答。如果你的文件格式不同,更好發佈作爲一個新的問題,因爲它是不公平的,我發佈了全面的答案後,改變輸入文件:) – DVK

1

使用列表:: MoreUtils「應用」,並設置input_record_separator到「標頭」,代碼可以像下面。

#!/usr/bin/perl 
use strict; 
use warnings; 
use List::MoreUtils qw/ apply /; 

my $fname = 'dup_data.txt'; 

open (my $input_fh, '<', $fname) or die "Unable to read '$fname' because $!"; 
open (my $OUTPUTA, ">", $fname .".reformat") 
    or die "$0: could not write to '$fname.reformat'. $!"; 

{ 
    local $/ = "HEADER"; 

    print $OUTPUTA map{ "HEADER$_->[0]"} 
        sort {$a->[1] <=> $b->[1] || $a->[2] <=> $b->[2]} 
        map {[$_, /TYPE(\d+)\|(\d+)/]} 
        grep $_, apply {chomp} <$input_fh>; 
} 
close $input_fh or die $!; 
close $OUTPUTA or die $!;