2015-12-14 178 views
-2

,我有以下格式的文件非常Perl腳本來處理文本文件

Status_ArsFlag = "" 
Status_NodeAlias = "" 
OID1 = ".1.3.6.1.4.1.11.2.17.19.2.2.1" 
1 = "NNMi" 
2 = "ASB" 
3 = "456" 
RawCaptureTimeStamp = 1450091580 


Status_Identifier = "" 
Status_NodeAlias = "" 
OID1 = ".1.3.6.1.4.1.11.2.17.19.2.2.1" 
1 = "NNMi" 
2 = "ASB" 
3 = "456" 
RawCaptureTimeStamp = 1450091580 


Status_Node = "" 
Status_NodeAlias = "" 
OID1 = ".1.3.6.1.4.1.11.2.17.19.2.2.1" 
1 = "NNMi" 
2 = "ASB" 
3 = "456" 
RawCaptureTimeStamp = 1450091580 

在哪裏設置的每個數據非常具有「STATUS_」行開始,以「RawCaptureTimeStamp」結束由2個新行分開。

現在的問題是在不理想的情況下,如果文件可以類似於下面:

1 = "NNMi" 
2 = "ASB" 
3 = "456" 
RawCaptureTimeStamp = 1450091580 


Status_ArsFlag = "" 
Status_NodeAlias = "" 
OID1 = ".1.3.6.1.4.1.11.2.17.19.2.2.1" 
1 = "NNMi" 
2 = "ASB" 
3 = "456" 
RawCaptureTimeStamp = 1450091580 


Status_ArsFlag = "" 
Status_NodeAlias = "" 
OID1 = ".1.3.6.1.4.1.11.2.17.19.2.2.1" 
1 = "NNMi" 

所看到的第一和最後一個數據集以上是無效的。我需要一個邏輯,我可以從原始文件中刪除這些不需要的數據集並重新保存。 我在PERL中嘗試了幾件事,但都失敗了。請幫忙。 我正在使用的代碼來讀取文件,並檢查文件是否以狀態開始,如果不讀取,直到達到rawcapturetimestamp。如果它不正確結束

while(my $line = <$cap_1>){ 
    if($. == 1 && $line !~ /^Status/){ #check if first line doesn't begin with status 
      while($line = <$cap_1>){#if not read till the occurence of RawCaptureTimeStamp 
      if($line =~/^RawCaptureTimeStamp/){ 
       $. = $.+1; 
       last; 
      } 
     } 
     $line = <$cap_1>; 
     if (eof()){ #After reading till raw capture timestamp, check for EOF 
      last; 
     } 
    } 
} 
+1

因此,空行總是分隔塊?以段落模式讀取文件(例如,設置'$/=「\ n \ n」;'),然後分析每個段落。 –

+2

你說你已經嘗試了幾件事情。代碼在哪裏? –

+0

我爲你寫了一個解決方案,因爲它對我感興趣,並將我帶入了我想要修改的Perl角落。但是,[**馬特·雅各**](http://stackoverflow.com/questions/34270575/perl-script-to-process-a-text-file#comment56285293_34270575)是右Stack Overflow是這裏提供的庫解決常見的問題。當你寧願放棄雙腳而不是做一些工作時,這也不是一個可以去的地方。 – Borodin

回答

2

我只想讀第mo段中的文件德(設置$/""和你的問題不是"\n\n"Jonathan Leffler commented) 和檢查每個段落的一致性

三個換行必須在每個塊的末尾來代替,如PerlIO的在此模式下

他們正常化兩個

它看起來像問題是,數據可以在任一端被截斷,所以我需要十個位數的時間戳,涵蓋了從2001年日期2286

use strict; 
use warnings 'all'; 

local $/ = ''; # Separate reads by one or more blank lines 

while (<>) { 

    next unless /^Status.+\nStatus/ and /^RawCaptureTimeStamp = \d{10}/m; 
    s/\s*\z/\n\n\n/; 

    print; 
} 

輸出(使用您錯誤的示例數據集)

Status_ArsFlag = "" 
Status_NodeAlias = "" 
OID1 = ".1.3.6.1.4.1.11.2.17.19.2.2.1" 
1 = "NNMi" 
2 = "ASB" 
3 = "456" 
RawCaptureTimeStamp = 1450091580 
+0

這完美的作品。即使是日期時間的考慮是+ –

0
#! /usr/bin/perl 
use warnings; 
use strict; 

$_ = q(); 
$_ = <> until /^Status_/; # Skip the invalid beginning; 

my $block = $_; 

while (<>) { 
    if (/^RawCaptureTimeStamp/) { # End of block: print it, start gathering a new one. 
     print $block, $_; 
     $block = q(); 

    } else {      # Inside of a block. 
     $block .= $_; 
    } 
} 

最後一個塊將不被打印。

+0

這將通過任何以* second'Status_'行開頭的塊。您還假設錯誤總是在第一和最後一個塊,這是我同意的,是有可能的,但考慮到它看起來像數據流的快照,因此可能會啓動並內的任何位置結束未確認 – Borodin

+0

此外,塊,你應該檢查'RawCaptureTimeStamp'後面是一個明智的*值* – Borodin

0

這工作,我相信:

#!/usr/bin/env perl 
use strict; 
use warnings; 

$/ = "\n\n"; 

while (<>) 
{ 
    s/^\s+//; 
    s/\s+$//; 
    print "\n[[", $_, "]]\n" 
     if (m/^Status_\w+ .*Status_\w+ /ms && m/^RawCaptureTimeStamp /m); 
} 

設置$/讀取高達雙新行(或EOF),有效地閱讀在一個時間段落。 if條件查找兩個Status_元素和RawCaptureTimeStamp;您可以根據需要細化這些條件,使其更加嚴格。 s修飾符允許.*匹配嵌入的換行符; m修飾符用於多行模式。例如,這可以用RawCaptureTimeStamp跟着其他行。

樣本數據,從問題複製:

Status_ArsFlag = "" 
Status_NodeAlias = "" 
OID1 = ".1.3.6.1.4.1.11.2.17.19.2.2.1" 
1 = "NNMi" 
2 = "ASB" 
3 = "456" 
RawCaptureTimeStamp = 1450091580 


Status_Identifier = "" 
Status_NodeAlias = "" 
OID1 = ".1.3.6.1.4.1.11.2.17.19.2.2.1" 
1 = "NNMi" 
2 = "ASB" 
3 = "456" 
RawCaptureTimeStamp = 1450091580 


Status_Node = "" 
Status_NodeAlias = "" 
OID1 = ".1.3.6.1.4.1.11.2.17.19.2.2.1" 
1 = "NNMi" 
2 = "ASB" 
3 = "456" 
RawCaptureTimeStamp = 1450091580 


1 = "NNMi" 
2 = "ASB" 
3 = "456" 
RawCaptureTimeStamp = 1450091580 


Status_ArsFlag = "" 
Status_NodeAlias = "" 
OID1 = ".1.3.6.1.4.1.11.2.17.19.2.2.1" 
1 = "NNMi" 
2 = "ASB" 
3 = "456" 
RawCaptureTimeStamp = 1450091580 


Status_ArsFlag = "" 
Status_NodeAlias = "" 
OID1 = ".1.3.6.1.4.1.11.2.17.19.2.2.1" 
1 = "NNMi" 

輸出示例:

[[Status_ArsFlag = "" 
Status_NodeAlias = "" 
OID1 = ".1.3.6.1.4.1.11.2.17.19.2.2.1" 
1 = "NNMi" 
2 = "ASB" 
3 = "456" 
RawCaptureTimeStamp = 1450091580]] 

[[Status_Identifier = "" 
Status_NodeAlias = "" 
OID1 = ".1.3.6.1.4.1.11.2.17.19.2.2.1" 
1 = "NNMi" 
2 = "ASB" 
3 = "456" 
RawCaptureTimeStamp = 1450091580]] 

[[Status_Node = "" 
Status_NodeAlias = "" 
OID1 = ".1.3.6.1.4.1.11.2.17.19.2.2.1" 
1 = "NNMi" 
2 = "ASB" 
3 = "456" 
RawCaptureTimeStamp = 1450091580]] 

[[Status_ArsFlag = "" 
Status_NodeAlias = "" 
OID1 = ".1.3.6.1.4.1.11.2.17.19.2.2.1" 
1 = "NNMi" 
2 = "ASB" 
3 = "456" 
RawCaptureTimeStamp = 1450091580]] 
+0

'$/=「\ n \ n」'是*不*以PerlIO的設置段落模式的方式。將方括號放在應該是您的解決方案的演示文件中的方式也有點破舊 – Borodin

+2

這是Perl:TMTOWTDI。 –

+0

的問題,這是在此舉例,是'$/=「\ n \ n」'可能導致包含要返回剛纔兩個新行,或者(因爲它似乎你發現)記載,*開始*以換行符記錄。所有這些都是用'$/=「」' – Borodin

0

使用Perl段落模式,如所描述here

#!/usr/bin/perl -w 

use strict; 

local $/ = ""; 

while (my $para = <DATA>) { 
    print $para if ($para =~ /^Status_.*RawCaptureTimeStamp/s); 
} 

__DATA__ 
1 = "NNMi" 
2 = "ASB" 
3 = "456" 
RawCaptureTimeStamp = 1450091580 


Status_ArsFlag = "" 
Status_NodeAlias = "" 
OID1 = ".1.3.6.1.4.1.11.2.17.19.2.2.1" 
1 = "NNMi" 
2 = "ASB" 
3 = "456" 
RawCaptureTimeStamp = 1450091580 


Status_ArsFlag = "" 
Status_NodeAlias = "" 
OID1 = ".1.3.6.1.4.1.11.2.17.19.2.2.1" 
1 = "NNMi" 
+0

這並沒有考慮數據丟失至* second *'Status_'行的可能性。它還將傳遞在「RawCaptureTimeStamp」後面結束的段落,並且沒有值。它也將減少中間空白行數從兩個到一個。你也應該更喜歡使用'警告「all''到'-w'命令行或認領線 – Borodin

+2

感謝上,都不會太晚,以改善 – klashxx

+0

作爲@Borodin說,我有同樣的觀察使,但有趣的方法,我沒有意識到。可能在未來的其他事情 –