2013-04-10 42 views
2

我有以下格式的歷史自動生成的日誌文件,我想轉換爲csv文件上傳到數據庫之前在awk/Perl的轉換文本文件與感知的形式

-------------------------------------- 
Thu Jul 8 09:34:12 BST 2010 
BLUE Head 1 
Duration = 20 s 
Activity = 14.9 MBq 
Sensitivity = 312 cps/MBq 
-------------------------------------- 
Thu Jul 8 09:34:55 BST 2010 
BLUE Head 1 
Duration = 20 s 
Activity = 14.9 MBq 
Sensitivity = 318 cps/MBq 
-------------------------------------- 
Thu Jul 8 10:13:39 BST 2010 
RED Head 1 
Duration = 20 s 
Activity = 14.9 MBq 
Sensitivity = 307 cps/MBq 
-------------------------------------- 
Thu Jul 8 10:14:10 BST 2010 
RED Head 1 
Duration = 20 s 
Activity = 14.9 MBq 
Sensitivity = 305 cps/MBq 
-------------------------------------- 
Mon Jul 19 10:11:18 BST 2010 
BLUE Head 1 
Duration = 20 s 
Activity = 12.4 MBq 
Sensitivity = 326 cps/MBq 
-------------------------------------- 
Mon Jul 19 10:12:09 BST 2010 
BLUE Head 1 
Duration = 20 s 
Activity = 12.4 MBq 
Sensitivity = 333 cps/MBq 
-------------------------------------- 
Mon Jul 19 10:13:57 BST 2010 
RED Head 1 
Duration = 20 s 
Activity = 12.4 MBq 
Sensitivity = 338 cps/MBq 
-------------------------------------- 
Mon Jul 19 10:14:45 BST 2010 
RED Head 1 
Duration = 20 s 
Activity = 12.4 MBq 
Sensitivity = 340 cps/MBq 
-------------------------------------- 

我想到csv日誌文件轉換爲以下格式

Date,Camera,Head,Duration,Activity 
08/07/10,BLUE,1,20,14.9 
08/07/10,BLUE,1,20,14.9 
08/07/10,RED,1,20,14.9 
08/07/10,RED,1,20,14.9 

我用awk來讓我接近我希望

awk 'BEGIN {print "Date,Camera,Head,Duration,Activity";RS = "--------------------------------------"; FS="\n";}; {OFS=",";split($3, a, " ");split($4,b, " "); split($5,c," ");print $2,a[1],a[3],b[3],c[3]}' sensitivity.txt > sensitivity.csv 

這給了我

Date,Camera,Head,Duration,Activity 
,,,, 
Thu Jul 8 09:34:12 BST 2010,BLUE,1,20,14.9 
Thu Jul 8 09:34:55 BST 2010,BLUE,1,20,14.9 
Thu Jul 8 10:13:39 BST 2010,RED,1,20,14.9 
Thu Jul 8 10:14:10 BST 2010,RED,1,20,14.9 

我怎樣才能

(一)線擺脫了4場輸出的隔板4 (b)由週四7月8日9時34分12秒轉換日期格式BST 2010 DD/MM/YY(我能做到這一點的純AWK或通過管道到perl)

+0

對於日期的轉換,看看第一個答案在這裏http://stackoverflow.com/questions/2121896/converting-dates-in-awk – marmottus 2013-04-10 10:30:28

+0

而對於無用的逗號,只是檢查$ 2的值, a,b,c等...(if($ 2){print ...}) – marmottus 2013-04-10 10:34:19

+4

我不認爲我曾見過任何人要求將自2000年以來的4位數年份轉換爲2位數年份一起滾。認真考慮使用YYYYMMDD日期格式,以便您可以區分1999年和2099年,並按日期對數據進行輕鬆排序。 – 2013-04-10 12:46:09

回答

1

這種直截了當awk腳本將做的工作:

BEGIN { 
    n=split("Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec",month,"|") 
    for (i=1;i<=n;i++) { 
     month_index[month[i]] = i 
    } 
    print "Date,Camera,Head,Duration,Activity" 
} 
/^-*$/{ 
    i=0 
    next 
} 
{ 
    i++ 
} 
i==1{ 
    printf "%02d/%02d/%02d,",$3,month_index[$2],substr($6,3) 
} 
i==2{ 
    printf "%s,%d,",$1,$3 
} 
i==3{ 
    printf "%d,",$3 
} 
i==4{ 
    printf "%.1f\n",$3 
} 

個輸出:

$ awk -f script.awk file 
08/07/10,BLUE,1,20,14.9 
08/07/10,BLUE,1,20,14.9 
08/07/10,RED,1,20,14.9 
08/07/10,RED,1,20,14.9 
19/07/10,BLUE,1,20,12.4 
19/07/10,BLUE,1,20,12.4 
19/07/10,RED,1,20,12.4 
19/07/10,RED,1,20,12.4 
+2

謝謝,一個可愛的解決方案 – moadeep 2013-04-10 10:53:57

+0

你的月數組將包含24個條目而不是12個。它將工作,只要OP不想打印所有的月份。考慮爲monthNr2Nm和monthNm2Nr使用2個數組。 – 2013-04-10 12:49:17

+0

@EdMorton我知道,我只是懶惰,我沒有看到OP想要遍歷數組,但改變它以防萬一。 – 2013-04-10 12:53:04

2

@ sudo_O的回答是不錯,但這裏是一個另類:

$ cat tst.awk 
BEGIN{ RS="---+\n"; OFS=","; months="JanFebMarAprMayJunJulAugSepOctNovDec" } 
NR==1{ print "Date","Camera","Head","Duration","Activity"; next } 
{ print sprintf("%04d%02d%02d",$6,(match(months,$2)+2)/3,$3),$7,$9,$12,$16 } 

$ gawk -f tst.awk file 
Date,Camera,Head,Duration,Activity 
20100708,BLUE,1,20,14.9 
20100708,BLUE,1,20,14.9 
20100708,RED,1,20,14.9 
20100708,RED,1,20,14.9 
20100719,BLUE,1,20,12.4 
20100719,BLUE,1,20,12.4 
20100719,RED,1,20,12.4 
20100719,RED,1,20,12.4 

請注意,我用上面awk的GNU,所以我可以在RS設置爲一個單一的性格比較。使用其他awks只需將所有「--- ...」的行轉換爲空行或控制字符或其他內容,然後在運行腳本之前相應地設置RS。

如果您不喜歡我建議的日期格式,請調整sprintf()以適應。

1

我想我會展示如何實際解析輸入,而不是僅僅執行字符串轉換。

#! /usr/bin/env perl 
use strict; 
use warnings; 
use Date::Parse; 
use Date::Format; 
use Text::CSV; 

sub convert_date{ 
    my $time = str2time($_[0]); 
    # iso 8601 style: 
    return time2str('%Y-%m-%d',$time); # YYYY-MM-DD 

    # or the outdated style output you wanted 
    return time2str('%d/%m/%y',$time); # DD/MM/YY 
} 

my %multiply_table = (
    s => 1, 
    m => 60, 
    h => 60 * 60, 
    d => 60 * 60 * 24, 
); 
sub convert_duration{ 
    my($d,$s) = $_[0] =~ /^ \s* (\d+) \s* (\w) \s* $/x; 
    die "Invalid duration '$_[0]'" unless $d && $s; 
    return $d * $multiply_table{$s}; 
} 

my @field_list = qw'Date Camera Head Duration Activity'; 

my $csv = Text::CSV->new({ eol => "\n" }); 

# print header 
$csv->print(\*STDOUT, \@field_list); 

# set record separator 
local $/ = ('-' x 38) . "\n"; 

# parse data 
while(<>){ 
    chomp; # remove record separator 
    next unless $_; # skip empty section 
    my($time,$camdat,@fields) = split m/\n/; # split up the fields 

    my %data; 


    # split camera and head fields 
    @data{qw(Camera Head)} = split /\s+Head\s+/, $camdat; 

    # parse lines like: 
    # Duration = 20 s 
    # Activity = 14.9 MBq 
    # Sensitivity = 305 cps/MBq 
    for(@fields){ 
    my($key,$value) = /(\w+) \s* = \s* (.*) /x; 
    $data{$key} = $value; 
    } 

    # at this point we start reducing precision 

    $data{Date} = convert_date($time); 

    # remove measurement units 
    $data{Duration} = convert_duration($data{Duration}); # safe 
    $data{Activity} =~ s/[^\d]*$//; # unsafe 

    $csv->print(\*STDOUT, [@data{@field_list}]); 
}