在Perl中解析（部分）不一致的文本塊

我在文件中（以及在程序中的此處的變量中）有幾個塊看起來像這樣的文件。在Perl中解析（部分）不一致的文本塊

Vlan2 is up, line protocol is up 
    .... 
    reliability 255/255, txload 1/255, rxload 1/255^M 
    .... 
    Last clearing of "show interface" counters 49w5d 
    Input queue: 0/75/0/0 (size/max/drops/flushes); Total output drops: 0 
    .... 
    L3 out Switched: ucast: 17925 pkt, 23810209 bytes mcast: 0 pkt, 0 bytes 
    33374 packets input, 13154058 bytes, 0 no buffer 
    Received 926 broadcasts (0 IP multicasts) 
    0 runts, 0 giants, 0 throttles 
    0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored 
    3094286 packets output, 311981311 bytes, 0 underruns 
    0 output errors, 0 interface resets 
    0 output buffer failures, 0 output buffers swapped out

下面是第二塊，向您展示塊怎麼能略有不同：

port-channel86 is down (No operational members) 
    ... 
    reliability 255/255, txload 1/255, rxload 1/255 
    ... 
    Last clearing of "show interface" counters 31w2d 
    ... 
    RX 
    147636 unicast packets 0 multicast packets 0 broadcast packets 
    84356 input packets 119954232 bytes 
    0 jumbo packets 0 storm suppression packets 
    0 runts 0 giants 0 CRC 0 no buffer 
    0 input error 0 short frame 0 overrun 0 underrun 0 ignored 
    0 watchdog 0 bad etype drop 0 bad proto drop 0 if down drop 
    0 input with dribble 0 input discard 
    0 Rx pause 
    TX 
    147636 unicast packets 0 multicast packets 0 broadcast packets 
    84356 output packets 119954232 bytes 
    0 jumbo packets 
    0 output error 0 collision 0 deferred 0 late collision 
    0 lost carrier 0 no carrier 0 babble 0 output discard 
    0 Tx pause 
    0 interface resets

我想挑選出從每個塊，這可能會或可能不會在每個塊存在某些數據元素。例如，在我發佈的第一個區塊中，我可能想知道有0個短缺，0個輸入錯誤和0個超限。在第二個塊中，我可能想知道有0個巨型數據包，衝突等。如果給定的查詢不在該塊中，只需返回na就可以接受，因爲這是爲了統一處理而設計的。

每個塊的結構與我發佈的兩個類似，換行符和空格分隔一些條目，逗號分隔其他條目。

我對於這可能如何工作有幾點想法。我不知道在Perl中是否有任何「回頭」函數，但我可以嘗試查找字段名稱（runts，「輸入錯誤」等），然後獲取前一個整數;這似乎是最優雅的解決方案，但我不確定是否可能。

目前，我在Perl中這樣做。我正在處理的每個「塊」實際上是幾個這樣的塊（用雙換行符分隔）。它不必在單個正則表達式中完成;我相信可以通過每塊應用幾個正則表達式來完成。性能並不是一個真正的因素，因爲這個腳本每小時運行一次。

我的目標是以自動化的方式將所有這些轉換成一個.csv文件（或其他一些易於描繪的數據格式）。

任何想法？

編輯：如我所提到的CSV輸出示例，它將逐行寫入（對於像這樣的多個條目）作爲最終結果的文件。如果在塊中找不到特定條目，則在相應行中標記爲na：

來源

2013-08-19 jyaworski

你可以發佈示例輸出嗎？ – Jotne

完成。我希望能回答這個問題。 – jyaworski

從一個塊的單個樣本推斷輸入的總體佈局是不可能的。爲什麼不把塊大小減少到僅代表真實塊的東西，然後張貼5塊左右的塊，以便我們瞭解塊之間格式可能會有所不同。 –

屬性和數字的簡單散列。

sub extract { 
    my ($block) = @_; 
    my %r; 
    while ($block =~ /(?<num>\d+) \s (?<name>[A-Za-z\s]+)/gmsx) { 
     my $name = $+{name}; 
     my $num = $+{num}; 
     $name =~ s/\A \s+//msx; 
     $name =~ s/\s+ \z//msx; 
     $r{$name} = $num; 
    } 
    return %r; 
} 

my $block = <<''; 
Vlan2 is up, line protocol is up 
⋮ 

my $block2 = <<''; 
port-channel86 is down (No operational members) 
⋮ 

use Data::Dumper qw(Dumper); 
print Dumper {extract $block}; 
print Dumper {extract $block2};

來源

2013-08-19 19:59:16 daxim

我不認爲一個正則表達式可以做到這一點，我也不想支持它，如果可能的話。

使用多個正則表達式，你可以很容易地使用類似：

(\d+) runts 
(\d+) input errors 
...etc...

屬性名稱的簡單陣列和循環可以解決這個問題很快和用很少大驚小怪。

如果您可以通過一些預處理將輸入剝離爲較小的塊，那麼您將不太可能得到誤報。

來源

2013-08-19 20:00:20 JDB

一個單一的正則表達式不需要它;多個查詢就足夠了。 – jyaworski

這裏有一種方法可以在awk中完成，但這需要進行大量的調整才能完美實現。但是，再次使用SNMP。

awk '{ 
    printf $1 
    for (i=1;i<=NF;i++) { 
     if ($i" "$(i+1)~/Input queue:/) printf ",%s",$(i+2) 
     if ($i~/runts/) printf ",%s",$(i-1) 
     if ($i~/multicast,/) printf ",%s",$(i-1) 
    } 
    print "" 
}' RS="swapped out" file

來源

2013-08-19 20:44:33 Jotne

在Perl中解析（部分）不一致的文本塊

回答

相關問題