的Perl：賽上重複第一場最後一個字段合併

見的問題在這裏： match-rows-based-on-first-field-and-combine-second-field 的Perl：賽上重複第一場最後一個字段合併

你將如何在給定下列條件的perl解決這個：

一個CSV文件
有幾個不同類別的重複記錄
您需要匹配第一個字段併合並/追加分類字段

示例文件：

External ID  Item Name  Item Description  Release Date Expiry Date  Weight Template ID  Enabled EntityId  Classifications Address N/A  City State Zipcode Country of domain purchase is made from Title Cover Image  Link Author 
411280 Shade me  Shade me  04-May-2017  01-Jan-9999    0  Y  -1  Teen             Shade me  MC.GIF  http://catalog.org/cgi-bin/koha/opac-detail.pl?biblionumber=411280 Brown, Jennifer 
411280 Shade me  Shade me  04-May-2017  01-Jan-9999    0  Y  -1  Books             Shade me  MC.GIF  http://catalog.org/cgi-bin/koha/opac-detail.pl?biblionumber=411280 Brown, Jennifer 
413036 Now that's what I call music! Now that's what I call music! 04-May-2017  01-Jan-9999    0  Y  -1  Teen             Now that's what I call music! MC.GIF http://catalog.org/cgi-bin/koha/opac-detail.pl?biblionumber=413036

所面臨的挑戰是要匹配重複的ID和合並的類別。

Shade me  MC.GIF  http://catalog.org/cgi-bin/koha/opac-detail.pl?biblionumber=411280 Brown, Jennifer 
    411280 Shade me  Shade me  04-May-2017  01-Jan-9999    0  Y  -1  Teen;Books

UPDATE

while (<FILE>) { 
     next if 1..1; 
     chomp $_; 

     my ($id, $name, $desc, $reldate, $expdate, $weight, $temp, $enabled, $ent, $class, $addr, $na, $city, $state, $zip, $country, $title, $img, $link, $auth) = split /\t/ , $_; 

     if (! $merge{$id}) { 
       $merge{$id} = "$id, $name, $desc, $reldate, $expdate, $weight, $temp, $enabled, $ent, $class, $addr, $na, $city, $state, $zip, $country, $title, $img, $link, $auth"; 
     } else { 
       $merge{$class} .= ";$class" if ($merge{$id} ne $class) 
     } 
} 

p %merge;

行給我的問題是：

$merge{$class} .= ";$class" if ($merge{$id} ne $class)

你可以看到什麼，我需要做的 - 合併類領域。不工作

來源

2017-05-04 Bubnoff

你嘗試過什麼至今？您遇到的具體問題或錯誤是什麼？重現問題所需的最短代碼在哪裏？ –

將發佈代碼。其實 - 幾乎能夠奏效。 – Bubnoff

已發佈代碼。它幾乎工作，但沒有結合班級領域。 – Bubnoff

我會將文件加載到某個數據結構中並記住每個唯一的列值，然後按需要打印它們。例如。在這個例子中（使用|，因爲它是爲\t分隔符更好可見）：

#!/usr/bin/env perl 

use 5.024; 
use warnings; 
use Data::Dumper; 

my $records; 
my $numcols; 
while(<DATA>) { 
    chomp; 
    my(@cols) = split /\|/, $_, -1; 
    $numcols = @cols if($. == 1); 
    die "Wrong number of columns (@{[scalar @cols]} instead of $numcols) in line $." unless (@cols == $numcols); 
    $records->{$cols[0]}->[0] = $. unless $records->{$cols[0]}; #remember the line# of the 1st apperance 
    for(my $c = 1; $c < $numcols; $c++) { #skip the id (col[0]) 
     $records->{$cols[0]}->[$c]->{$cols[$c]}++; 
    } 
} 
# if want, check the data-structure 
#say Dumper($records); 

for my $id (sort {$records->{$a}->[0] <=> $records->{$b}->[0]} keys %$records) { 
    say join("|", 
      $id, 
      map { join(';', sort grep {/\S/} keys $records->{$id}->[$_]->%*) } 1 .. $#{$records->{$id}} #skip col[0] 
     ); 
} 

__DATA__ 
ID|Name1|Name2|Name3 
id1|c11|c12|c13 
id1|c11|c12|c13 
id2|c21|c22|c23 
id1|c31|c12|c13 
id3|c41||c43 
id1|c51|c12|c13 
id1|c31||c13 
id1|c11||c13 
id1|c31|c12|c13 
id2|c21|c22|c83 
id4|c91|c92|

打印

ID|Name1|Name2|Name3 
id1|c11;c31;c51|c12|c13 
id2|c21|c22|c23;c83 
id3|c41||c43 
id4|c91|c92|

使用一些外殼爲非常列perl script.pl | sed 's/||/| |/g' | column -s'|' -t

ID Name1  Name2 Name3 
id1 c11;c31;c51 c12 c13 
id2 c21   c22 c23;c83 
id3 c41     c43 
id4 c91   c92

來源

2017-05-05 13:18:12 jm666

我有5.22，並且出現錯誤「接近」 - >％「'。有沒有相當於這個？ – Bubnoff

這是行25.是否有另一種寫作方式？ – Bubnoff

明白了！使用以下內容：'use feature qw（postderef say）;'。 – Bubnoff

-2

一個簡單的方法對小文件可以如上完成：

#!/usr/bin/env perl 

use common::sense;  
use DDP; 

my %merge; 

while (<DATA>) 
{ 

next if 1..1; 

chomp $_; 

my ($id, $text, $category) = split /,/ , $_; 

if (! $merge{$id}) 
{ 
    $merge{$id} = "$id,$text,$category"; 
} 
else 
{ 
     my (undef, undef , $c) = split /,/ , $merge{$id}; 

     if ($c !~ /\b$category\b/) 
     { 
      $merge{$id} .= ";$category"; 
     } 
} 


} 

p %merge; 

__DATA__ 
Id Title    Category 
12345,My favorite martian,aliens 
13444,Texas Meat,BBQ 
12345,My favorite martian,aliens

輸出：

{ 
    12345 "12345,My favorite martian,aliens;space", 
    13444 "13444,Texas Meat,BBQ" 
}

考慮你不想重複的類別，這將有助於。

來源

2017-05-04 21:23:50 carlosn

通過調整或2，這將很好。謝謝！ – Bubnoff

如果第一個類別是'icecream'，而第二個'ice'則不會合並，因爲它們會匹配子字符串。 – jm666

@ jm666 - 你會有什麼建議？完全匹配而不是REGEX？那會是'（$ merge {$ id} ne $ category）'正確嗎？ – Bubnoff

的Perl：賽上重複第一場最後一個字段合併

回答

相關問題