mailgun報告爲csv格式perl

我有一個問題。我想編寫一個perl腳本來將Mailgun輸出解析爲csv格式。我會假設'拆分'和'連接'功能可以適用於此過程。下面是一些示例數據：mailgun報告爲csv格式perl

樣本數據

{ 

    "geolocation": { 

    "city": "Random City", 

    "region": "State", 

    "country": "US" 
    }, 
    "url": "https://www4.website.com/register/1234567", 

    "timestamp": "1237854980723.0239847" 
} 


{ 

    "geolocation": { 

    "city": "Random City2", 

    "region": "State2", 

    "country": "mEXICO" 
    }, 
    "url": "https://www4.website2.com/register/ABCDE567", 

    "timestamp": "1237854980723.0239847" 
}

所需的輸出

「城市」，「區域」，「國家」，「URL」，「時間戳」

「隨機城市」，「州」，「美國」，「https://www4.website.com/register/1234567」，「1237854980723.0239847」

「隨機City_2」，「State_2」，「mEXICO」，「www4.website2.com/ABCDE567","1234.jpg」，網址爲：http：//www4.website2.com/ABCDE567 ,,「1237854980723.0239847_2」

我的目標是將我的Sample數據創建爲逗號分隔的CSV文件。我不確定如何去解決這個問題。通常我會嘗試通過批處理文件中的一系列單行程來破解，但我更喜歡perl腳本。真實的數據將包含更多信息。但是，只要弄清楚如何解析一般結構就沒問題。

這是我在一個批處理文件中。

代碼

perl -p -i.bak -e "s/(,$|,+ +$|^.*?{$|^.*?}.*?$|^.*?],.*?$)//gi" file.txt 

    rem Removes all unnecessary characters and lines with { and }.^

    perl -p -i.bak -e "s/(^ +| +$)//gi" file.txt  

    perl -p -i.bak -e "s/^\n$//gi" file.txt 


rem Removes all blank lines in initial file. Next one-liner takes care of trailing and beginning 

rem whitespace. The file is nice and clean now. 

perl -p -e "s/(^\".*?\"):.*?$/$1/gi" file.txt > header.txt 

rem retains only header info and puts into 'header.txt'^

perl -p -e "s/^\".*?\": +(\".*?\"$)/$1/gi" file.txt > data.txt 

rem retains only data that is associated with each field. 

perl -p -i.bak -e "s/\n/,/gi" data.txt 

rem replaces new line character with ',' delimiter. 

perl -p -i.bak -e "s/^/\n/gi" data.txt 

rem drops data down a line 

perl -p -i.bak -e "s/\n/,/gi" header.txt 

rem replaces new line character with ',' delimiter. 

copy header.txt+data.txt report.txt 

rem copies both files together. Since there is the same amount of fields as there are data 

rem delimiters, the columns and headers match.

我的輸出

「城市」，「區域」，「國家」，「URL」，「時間戳」

「隨機城」「國家」，「美國」，「https://www4.website.com/register/1234567」，1237854980723.0239847

這是做的伎倆，但濃縮腳本會更好。變化的情況會影響到這個批處理腳本，我需要更堅實的東西。有什麼建議麼？？

來源

2014-08-27 JDE876

使用[JSON]（https://metacpan.org/pod/JSON）。 – jm666 2014-08-27 21:38:48

您可以使用一個Perl腳本用一個正則表達式

#!/usr/bin/env perl 
use v5.10; 
use Data::Dumper; 

$_ = <<TXT; 
{ 

    "geolocation": { 

    "city": "Random City", 

    "region": "State", 

    "country": "US" 
    }, 
    "url": "https://www4.website.com/register/1234567", 

    "timestamp": "1237854980723.0239847" 
} 
TXT 

my @matches = /\s*\s*("[^"]+")\s*\s*:\s*("[^"]+")/gmx; 
my %hash = @matches; 

say join(",", keys %hash); 
say join(",", values %hash);

其中輸出這樣的：

"city","country","region","timestamp","url" 
"Random City","US","State","1237854980723.0239847","https://www4.website.com/register/1234567"

當然，如果你想使用標準輸入，而不是你替換字符串定義：

local $/ = undef; 
$_ = <>;

如果你想要一個更健壯的代碼，我建議首先匹配數據塊包含編成括號。然後你會搜索關鍵字：值。

我會寫這個program.pl文件：

#!/usr/bin/env perl 
use v5.10; 
use Data::Dumper; 

local $/ = undef;  
open FILE, $ARGV[0] or die $!; 
$_ = <FILE>; 
close FILE; 

# Match all group { ... } 
my @groups = /((?&BRACKETED)) 
(?(DEFINE) 
    (?<WORD>  [^\{\}]+) 
    (?<BRACKETED> \s* \{ (?&TEXT)? \s* \}) 
    (?<TEXT>  (?: (?&WORD) | (?&BRACKETED))+) 
)/gmx; 

# Match any key:value pairs inside each group 
my @results; 
for(grep($_,@groups)) { 
    push @results, {/\s*\s*"([^"]+)"\s*\s*:\s*("[^"]+")/gmx}; 
} 

# For each result, we print the keys we want 
for(@results) { 
    say join ",", @$_{qw/city region country url timestamp/}; 
}

然後一個批處理文件來調用腳本：

rem How to call it... 
@perl program.pl text.txt > report.txt

來源

2014-08-27 22:23:28 nowox

我喜歡你的答案。它以他們想要的方式工作，但請查看我剛剛對我的問題所做的編輯。查看所需的輸出和我提供的重新編輯的樣本數據。如果有兩組數據呢？所以csv將包含我們提取的頭，然後在它下面將是數據行1，數據行2等等。 @coin – JDE876 2014-09-02 17:53:18

@ JDE876腳本的第二個版本將輸出您期望的內容：每個城市有兩行代碼。但是，而不是使用正則表達式來解析您的數據，我會建議使用JSON解析器。 – nowox 2014-09-02 18:27:57

是否有任何可能的方式可以提供用JSON解析器替換正則表達式的示例？ @coin – JDE876 2014-09-02 19:17:25

完全沒有@硬幣的正則表達式福嗤之以鼻，但使用CPAN模塊的優點包括獲得一個更加靈活的解決方案，並且可以利用其他人已經制定的邊緣案例處理。

該解決方案使用JSON模塊來解析您的傳入數據（我假設它繼續看起來像JSON），並使用CSV模塊生成高質量的CSV，這樣可以處理像嵌入式引號和逗號之類的內容你的數據。

use warnings; 
use strict; 

use JSON qw/decode_json/; 
use Text::CSV_XS; 

my $json_data_as_string = <<EOL; 
{ 
    "geolocation": { 
     "city": "Random City", 
     "region": "State", 
     "country": "US" 
    }, 
    "url": "https://www4.website.com/register/1234567", 
    "timestamp": "1237854980723.0239847" 
} 
EOL 

my $s = decode_json($json_data_as_string); 

my $csv = Text::CSV_XS->new({ binary => 1 }); 

$csv->combine(
    $s->{geolocation}{city}, 
    $s->{geolocation}{region}, 
    $s->{geolocation}{country}, 
    $s->{url}, 
    $s->{timestamp}, 
) || die $csv->error_diag;; 

print $csv->string, "\n";

要從文件讀取數據到$ json_data_as_string中，您可以使用@ coin解決方案中的代碼。

來源

2014-08-28 03:44:00

mailgun報告爲csv格式perl

回答

相關問題