2017-05-04 12 views
0

見的問題在這裏: match-rows-based-on-first-field-and-combine-second-field的Perl:賽上重複第一場最後一個字段合併

你將如何在給定下列條件的perl解決這個:

  • 一個CSV文件
  • 有幾個不同類別的重複記錄
  • 您需要匹配第一個字段併合並/追加分類字段

示例文件:

External ID  Item Name  Item Description  Release Date Expiry Date  Weight Template ID  Enabled EntityId  Classifications Address N/A  City State Zipcode Country of domain purchase is made from Title Cover Image  Link Author 
411280 Shade me  Shade me  04-May-2017  01-Jan-9999    0  Y  -1  Teen             Shade me  MC.GIF  http://catalog.org/cgi-bin/koha/opac-detail.pl?biblionumber=411280 Brown, Jennifer 
411280 Shade me  Shade me  04-May-2017  01-Jan-9999    0  Y  -1  Books             Shade me  MC.GIF  http://catalog.org/cgi-bin/koha/opac-detail.pl?biblionumber=411280 Brown, Jennifer 
413036 Now that's what I call music! Now that's what I call music! 04-May-2017  01-Jan-9999    0  Y  -1  Teen             Now that's what I call music! MC.GIF http://catalog.org/cgi-bin/koha/opac-detail.pl?biblionumber=413036 

所面臨的挑戰是要匹配重複的ID和合並的類別。

Shade me  MC.GIF  http://catalog.org/cgi-bin/koha/opac-detail.pl?biblionumber=411280 Brown, Jennifer 
    411280 Shade me  Shade me  04-May-2017  01-Jan-9999    0  Y  -1  Teen;Books 

UPDATE

while (<FILE>) { 
     next if 1..1; 
     chomp $_; 

     my ($id, $name, $desc, $reldate, $expdate, $weight, $temp, $enabled, $ent, $class, $addr, $na, $city, $state, $zip, $country, $title, $img, $link, $auth) = split /\t/ , $_; 

     if (! $merge{$id}) { 
       $merge{$id} = "$id, $name, $desc, $reldate, $expdate, $weight, $temp, $enabled, $ent, $class, $addr, $na, $city, $state, $zip, $country, $title, $img, $link, $auth"; 
     } else { 
       $merge{$class} .= ";$class" if ($merge{$id} ne $class) 
     } 
} 

p %merge; 

行給我的問題是:

$merge{$class} .= ";$class" if ($merge{$id} ne $class) 

你可以看到什麼,我需要做的 - 合併類領域。不工作

+1

你嘗試過什麼至今?您遇到的具體問題或錯誤是什麼?重現問題所需的最短代碼在哪裏? –

+1

將發佈代碼。其實 - 幾乎能夠奏效。 – Bubnoff

+0

已發佈代碼。它幾乎工作,但沒有結合班級領域。 – Bubnoff

回答

2

我會將文件加載到某個數據結構中並記住每個唯一的列值,然後按需要打印它們。例如。在這個例子中(使用|,因爲它是爲\t分隔符更好可見):

#!/usr/bin/env perl 

use 5.024; 
use warnings; 
use Data::Dumper; 

my $records; 
my $numcols; 
while(<DATA>) { 
    chomp; 
    my(@cols) = split /\|/, $_, -1; 
    $numcols = @cols if($. == 1); 
    die "Wrong number of columns (@{[scalar @cols]} instead of $numcols) in line $." unless (@cols == $numcols); 
    $records->{$cols[0]}->[0] = $. unless $records->{$cols[0]}; #remember the line# of the 1st apperance 
    for(my $c = 1; $c < $numcols; $c++) { #skip the id (col[0]) 
     $records->{$cols[0]}->[$c]->{$cols[$c]}++; 
    } 
} 
# if want, check the data-structure 
#say Dumper($records); 

for my $id (sort {$records->{$a}->[0] <=> $records->{$b}->[0]} keys %$records) { 
    say join("|", 
      $id, 
      map { join(';', sort grep {/\S/} keys $records->{$id}->[$_]->%*) } 1 .. $#{$records->{$id}} #skip col[0] 
     ); 
} 

__DATA__ 
ID|Name1|Name2|Name3 
id1|c11|c12|c13 
id1|c11|c12|c13 
id2|c21|c22|c23 
id1|c31|c12|c13 
id3|c41||c43 
id1|c51|c12|c13 
id1|c31||c13 
id1|c11||c13 
id1|c31|c12|c13 
id2|c21|c22|c83 
id4|c91|c92| 

打印

ID|Name1|Name2|Name3 
id1|c11;c31;c51|c12|c13 
id2|c21|c22|c23;c83 
id3|c41||c43 
id4|c91|c92| 

使用一些外殼爲非常列perl script.pl | sed 's/||/| |/g' | column -s'|' -t

ID Name1  Name2 Name3 
id1 c11;c31;c51 c12 c13 
id2 c21   c22 c23;c83 
id3 c41     c43 
id4 c91   c92 
+0

我有5.22,並且出現錯誤「接近」 - >%「'。有沒有相當於這個? – Bubnoff

+0

這是行25.是否有另一種寫作方式? – Bubnoff

+0

明白了!使用以下內容:'use feature qw(postderef say);'。 – Bubnoff

-2

一個簡單的方法對小文件可以如上完成:

#!/usr/bin/env perl 

use common::sense;  
use DDP; 

my %merge; 

while (<DATA>) 
{ 

next if 1..1; 

chomp $_; 

my ($id, $text, $category) = split /,/ , $_; 

if (! $merge{$id}) 
{ 
    $merge{$id} = "$id,$text,$category"; 
} 
else 
{ 
     my (undef, undef , $c) = split /,/ , $merge{$id}; 

     if ($c !~ /\b$category\b/) 
     { 
      $merge{$id} .= ";$category"; 
     } 
} 


} 

p %merge; 

__DATA__ 
Id Title    Category 
12345,My favorite martian,aliens 
13444,Texas Meat,BBQ 
12345,My favorite martian,aliens 

輸出:

{ 
    12345 "12345,My favorite martian,aliens;space", 
    13444 "13444,Texas Meat,BBQ" 
} 

考慮你不想重複的類別,這將有助於。

+0

通過調整或2,這將很好。謝謝! – Bubnoff

+0

如果第一個類別是'icecream',而第二個'ice'則不會合並,因爲它們會匹配子字符串。 – jm666

+0

@ jm666 - 你會有什麼建議?完全匹配而不是REGEX?那會是'($ merge {$ id} ne $ category)'正確嗎? – Bubnoff

相關問題