用perl

檢索模式之間的線我有一個包含類似下面列表的文件：用perl

ID: ID_A 
attr1: attribute 
attr2: name 
attr3: city 


ID: ID_B 
attr1: attribute2 
attr2: name2 
attr3: city3 
attr4: country

文件包含有關60K這種條目。唯一標識符始終在ID線上。一旦我點擊一個新的ID，我需要能夠檢索該ID的所有屬性。

我努力做到以下幾點：

if($line=/ID/../ID) 
{ 
    $job[0]=$line 
}

但是，這並不工作，我也必須創建一個數組每次這就足夠足夠大或小。任何提示如何進行將非常有幫助。

謝謝。 JS

來源

2015-12-09 John Squarry

預期產量是多少？ –

不只是輸出，但你打算如何使用這些數據，一旦你分開它？ –

條目是否總是用空行分隔？ – choroba

我會創建一個hash-of-hashes（因爲您不知道文件中可能遇到什麼屬性）。主散列的關鍵是ID，每個條目的內容都是另一個子散列。該子散列具有屬性名稱作爲鍵。

這不是地道的perl的所有，但在我的測試工作...

#!/usr/bin/perl 
use strict; 
use Data::Dumper; 
my %master; 
my %tmphash; 
my $oldid=""; 
my $id; 

# Create a hash-of-hashes 
while (<>) { 
    if (/^ID: (.*)/) { 
    $id=$1; 
    # We need to skip the first one to "prime the pump" 
    if ($oldid ne "") { 
     $master{$oldid}={%tmphash}; 
    } 
    $oldid=$id; 
    %tmphash=(); 
    } else { 
    # Until we get to the next ID: add anything we find to tmphash 
    if (/^(.*): (.*)/) { 
     $tmphash{$1}=$2; 
    } 
    } 
} 
# Don't forget the last one... 
$master{$oldid}={%tmphash}; 

print Dumper(\%master); 

foreach my $id (sort keys %master) { 
    foreach my $attr (keys %{ $master{$id} }) { 
     print "$id, $attr: $master{$id}{$attr}\n"; 
    } 
}

來源

2015-12-10 02:15:24

很難提供一個體面的答案不知道你的期望的輸出格式，或者你打算如何使用這些數據，但這將讓你90％的方式出現：

use strict; 
use warnings; 

my %data; 
my $id; 

while (<DATA>) { 
    chomp; 
    next unless /\S/; 
    my ($key, $value) = split(/\s*:\s*/); 

    if ($key eq 'ID') { 
     $id = $value; 
     next; 
    } 

    $data{$id}{$key} = $value; 
} 

print "$data{ID_B}{attr2}\n"; # prints name2 

__DATA__ 
ID: ID_A 
attr1: attribute 
attr2: name 
attr3: city 

ID: ID_B 
attr1: attribute2 
attr2: name2 
attr3: city3 
attr4: country

來源

2015-12-10 04:53:14

，如果你使用的$/這是容易得多 - 記錄分隔符。並將其設置爲"\n\n"。

但正如在Dave Cross的評論中指出的那樣 - 最好將其設置爲''，因爲然後perl將跳過多個空行，否則將實現相同的結果。

#!/usr/bin/perl 
use strict; 
use warnings; 

use Data::Dumper; 

#set record separator to (one or more) blank lines 
local $/ = ''; 

#iterate each chunk of data 
while (<DATA>) { 
    #g matches repeatedly, and so this'll get alternating values 
    #this conveniently is what you need to assign straight to a hash 
    my %record = m/(\w+): (.*)/g; 
    print Dumper \%record; 
} 

__DATA__ 
ID: ID_A 
attr1: attribute 
attr2: name 
attr3: city 

ID: ID_B 
attr1: attribute2 
attr2: name2 
attr3: city3 
attr4: country

一旦你拉你的記錄/場，你可以把他們變成一組記錄：

push (@all_records, \%record);

，並提供：

$VAR1 = [ 
      { 
      'attr2' => 'name', 
      'ID' => 'ID_A', 
      'attr1' => 'attribute', 
      'attr3' => 'city' 
      }, 
      { 
      'attr2' => 'name2', 
      'ID' => 'ID_B', 
      'attr4' => 'country', 
      'attr1' => 'attribute2', 
      'attr3' => 'city3' 
      } 
     ];

或者把它放進一個散列散列，鍵入ID號：

$all_records{$record{ID}} = \%record;

贈送：

$VAR1 = { 
      'ID_A' => { 
         'ID' => 'ID_A', 
         'attr3' => 'city', 
         'attr1' => 'attribute', 
         'attr2' => 'name' 
        }, 
      'ID_B' => { 
         'attr2' => 'name2', 
         'attr3' => 'city3', 
         'attr1' => 'attribute2', 
         'attr4' => 'country', 
         'ID' => 'ID_B' 
        } 
     };

取決於一點你與記錄做什麼 - 你可能並不需要「保持」他們不惜一切，如果你只是處理和丟棄的，如果你有重複的ID，那麼你可能不想使用哈希方法的散列（ID必須是唯一的才能工作）。

來源

2015-12-10 07:52:58 Sobrique

將'$ /'設置爲空字符串會產生相同的效果。如果（出於某種原因）記錄之間存在多條空白行，也會起作用。 –

優秀點。我會修改它。 – Sobrique

回答

相關問題