2013-04-20 34 views
0

我有一個文件,其中包含長行是這樣的:Perl的 - 在一個變量商店正則表達式的公式,然後用它來匹配一個正則表達式

XEP.101  :1804 000000:I:XEPInfoFormat:Status=ok:TID=00000000516F6161-000874C3-00003E19-62F2B0C6:CallType=gprs:CallStart=20130415210553:CallDuration=4334:ServedParty=724044024363999:ServedLocation=724:OtherParty=TIM:OtherLocation=tim.br:ServedZone=ZO00001:OtherZone=ZP32363:TariffZone=ZN1261:CUST_ID=58922505:CO_ID=58891164:account=8327813:MSISDN=554599836655:theoretical_cost_value=33.323525:BA_Line_Main_value=NA:Tariff=TM_PL5PR:FU_Packs_used=FU_PLWI2:SNCODE_FU=1350_1250_1_BA_FU_PLWI2_Byt_Internet2:MCs_used=NO:bcd=20100319,bcp=P1M:InputFilename=201304172345.000020:EipFilename=/gold/rte/data/RatedEvents/EIP/10/101201304172345.000020:RtxFilename=/gold/rte/data/RatedEvents/RTX/01/OPSCGOLD_20130418000000_1011917.xml:BadrateFilename=/gold/rte/data/BadRate/bad_rate_xep10.201304172345.000020.tmp:FILE=/gold/rte/data/IncomingCDRs/ASN1/010/GPRS99+GPRS99-46299-1304172357-SA.TTF;TICKET=660 

所以我有這個條件,以配合在Perl這條線:

if ($line =~ m/XEP.[0-9].*:(\d{4}) (\d{2})(\d{2})(\d{2}).*XEPInfoFormat:Status=(\w*):TID=(\S*):CallType=(\w*):CallStart=(\d*):CallDuration=(\d*):ServedParty=(\d*):ServedLocation=(\d*):OtherParty=(\w*):OtherLocation=(\w*):ServedZone=(\w*):OtherZone=(\w*):TariffZone=(\w*):CUST_ID=(\d*):CO_ID=(\d*):account=(\d*):MSISDN=(\d*):theoretical_cost_value=(\d*)\.(\d*):BA_Line_Main_value=(\w*):Tariff=(\w*):FU_Packs_used=(\w*):SNCODE_FU=(\w*):MCs_used=(\w*):bcd=(\d*),bcp=(\w*):InputFilename=(\d*)\.(\d*):EipFilename=\/\w*\/\w*\/\w*\/\w*\/\w*\/(\d*)\/(\d*)\.(\d*).*FILE=\/\w*\/\w*\/\w*\/\w*\/\w*\/(\d*)\/(\w*)\+(\w*)-(\d*)-(\d*)-(\w*).(\w*);TICKET=(\d*)/) { 

所以對於我來說沒關係,這是匹配,並帶給我結果。然而,我想讓它更加靈活,例如,如果我想匹配整行並在我的匹配中指定一個字段作爲我的腳本中的選項,例如, (前TID =含),那麼,我想要做的是:

use Getopt::Std; 
getopts("Ch:t:",\%opts); 

if ($opts{t}) { 
    $TIDS = $opts{t}; 
} else { 
    $TIDS = '/S*'; 
} 

所以,我試圖做這樣的,我的比賽替代與變量$ TIDS,使用getopts的-t

if ($line =~ m/XEP.[0-9].*:(\d{4}) (\d{2})(\d{2})(\d{2}).*XEPInfoFormat:Status=(\w*):TID=(${TIDS}) 

所以,如果我指定與-t選項的參數,如:

perl-script.pl -t 888894343 

我想,這在我的整個正則表達式這樣的匹配:

if ($line =~ m/XEP.[0-9].*:(\d{4}) (\d{2})(\d{2})(\d{2}).*XEPInfoFormat:Status=(\w*):TID=(888894343) 

但是,如果我不指定此,我想它是這樣匹配:

if ($line =~ m/XEP.[0-9].*:(\d{4}) (\d{2})(\d{2})(\d{2}).*XEPInfoFormat:Status=(\w*):TID=(/S*) 

我知道,我可以簡單地匹配(/ S *)的所有行,然後把一些簡單,如果條件像下面,但這種方式我失去的性能,因爲有很多像我送給爲例行,所以我想有

print "$line\n" if $6 eq $TIDS; 

靈活比賽沒有任何人有什麼想法?我嘗試使用quotemeta,把簡單的引號,雙引號我的正則表達式,但沒有奏效。

+0

你只能做正則表達式分隔符中替代,替代無法提供自己的分隔符(perl的解析正則表達式,然後再插值變量進去)。而且你在第一個版本中完全錯過了結束分隔符。 – Barmar 2013-04-20 18:21:09

回答

0

如果你正在嘗試使用quotemeta一個變量,如命令行參數,你需要做這樣的事情:

$foo = quotemeta($ARGV[0]); 
+0

嗯,我沒有得到它使用quotemeta。我用這樣的東西: if($ opts {t}){ $ TIDS = $ opts {t}; } else { $ TIDS =「\\ S \ *」; } – 2013-04-22 15:18:22

0

您的代碼不起作用的主要原因是,你正在使用'/S*',它匹配一個斜槓後跟零個或多個S個字符,而不是'\S*',它是零個或多個空白字符。

但是,我認爲使用split /:/將每條記錄拆分爲字段會更好,而不是使用正則表達式。此外,前四個之後的所有字段都是name=value的字段,因此可以方便地將這些字段放入散列以方便訪問。那麼你所要做的就是檢查if ($ch{t} eq $params{TID}) { ... }

此代碼演示。我使用Data::Dump來顯示構建的%params散列的內容。目前還不清楚前四個字段中的信息是否有意義,但如果您需要,我已將它們解壓到@params

use strict; 
use warnings; 

use Data::Dump; 

my %opts = (t => 888894343); 

while (my $line = <DATA>) { 
    chomp $line; 
    my %params = $line =~ /([^:=]+)=([^:=]+)/g; 
    ddx \%params; 
    #next if $opts{t} and $params{TID} ne $opts{t}; 
    my @params = (split /:/, $line, 5)[0..3]; 
    ddx \@params; 
    #print $line; 
} 

__DATA__ 
XEP.101  :1804 000000:I:XEPInfoFormat:Status=ok:TID=00000000516F6161-000874C3-00003E19-62F2B0C6:CallType=gprs:CallStart=20130415210553:CallDuration=4334:ServedParty=724044024363999:ServedLocation=724:OtherParty=TIM:OtherLocation=tim.br:ServedZone=ZO00001:OtherZone=ZP32363:TariffZone=ZN1261:CUST_ID=58922505:CO_ID=58891164:account=8327813:MSISDN=554599836655:theoretical_cost_value=33.323525:BA_Line_Main_value=NA:Tariff=TM_PL5PR:FU_Packs_used=FU_PLWI2:SNCODE_FU=1350_1250_1_BA_FU_PLWI2_Byt_Internet2:MCs_used=NO:bcd=20100319,bcp=P1M:InputFilename=201304172345.000020:EipFilename=/gold/rte/data/RatedEvents/EIP/10/101201304172345.000020:RtxFilename=/gold/rte/data/RatedEvents/RTX/01/OPSCGOLD_20130418000000_1011917.xml:BadrateFilename=/gold/rte/data/BadRate/bad_rate_xep10.201304172345.000020.tmp:FILE=/gold/rte/data/IncomingCDRs/ASN1/010/GPRS99+GPRS99-46299-1304172357-SA.TTF;TICKET=660 

輸出

# para.pl:11: { 
# account    => 8327813, 
# BA_Line_Main_value  => "NA", 
# BadrateFilename  => "/gold/rte/data/BadRate/bad_rate_xep10.201304172345.000020.tmp", 
# bcd     => "20100319,bcp", 
# CallDuration   => 4334, 
# CallStart    => 20130415210553, 
# CallType    => "gprs", 
# CO_ID     => 58891164, 
# CUST_ID    => 58922505, 
# EipFilename   => "/gold/rte/data/RatedEvents/EIP/10/101201304172345.000020", 
# FILE     => "/gold/rte/data/IncomingCDRs/ASN1/010/GPRS99+GPRS99-46299-1304172357-SA.TTF;TICKET", 
# FU_Packs_used   => "FU_PLWI2", 
# InputFilename   => "201304172345.000020", 
# MCs_used    => "NO", 
# MSISDN     => 554599836655, 
# OtherLocation   => "tim.br", 
# OtherParty    => "TIM", 
# OtherZone    => "ZP32363", 
# RtxFilename   => "/gold/rte/data/RatedEvents/RTX/01/OPSCGOLD_20130418000000_1011917.xml", 
# ServedLocation   => 724, 
# ServedParty   => 724044024363999, 
# ServedZone    => "ZO00001", 
# SNCODE_FU    => "1350_1250_1_BA_FU_PLWI2_Byt_Internet2", 
# Status     => "ok", 
# Tariff     => "TM_PL5PR", 
# TariffZone    => "ZN1261", 
# theoretical_cost_value => 33.323525, 
# TID     => "00000000516F6161-000874C3-00003E19-62F2B0C6", 
# } 
# para.pl:14: [" XEP.101  ", "1804 000000", "I", "XEPInfoFormat"] 
+0

這是相當有趣的解決方案的建議,我會考慮一下。謝謝 – 2013-04-22 15:17:23

0

另一項建議。沒有必要檢查TID的值並一次性解析該行:您可以先快速檢查記錄,然後解析(使用散列技術或使用正則表達式),如果它是出於興趣。

while (<>) { 
    next if $opts{t} and $line !~ /:TID=$opts{t}:/; 
    # Parse and process record 
} 
+0

Humm ...這是因爲意圖是把每場比賽在我的行中的每個領域靈活地把它放在getopts,所以我認爲我需要另一種解決方案 – 2013-04-22 14:37:45

相關問題