2017-04-08 19 views
0

我正在使用ping服務器並返回結果的程序。文本格式的結果數據與我需要的CSV格式完全相反。我對正則表達式感到厭倦,在我看來,將這些數據展平將會非常複雜。正則表達式使用正則表達式拼合數據

1.2. 3. 4 | Min | Avg | Max |Std.Dev|Reliab%| 
    ----------------+-------+-------+-------+-------+-------+ 
    + Cached Name | 0.000 | 0.000 | 0.000 | 0.000 | 100.0 | 
    + Uncached Name | 0.040 | 0.100 | 0.250 | 0.065 | 100.0 | 
    + DotCom Lookup | 0.049 | 0.121 | 0.182 | 0.040 | 100.0 | 
    ---<-------->---+-------+-------+-------+-------+-------+ 
        sub.domain.com 
       Some Provider, LLC 

    5.6. 7. 8 | Min | Avg | Max |Std.Dev|Reliab%| 
    ----------------+-------+-------+-------+-------+-------+ 
    + Cached Name | 0.000 | 0.000 | 0.000 | 0.000 | 100.0 | 
    + Uncached Name | 0.040 | 0.100 | 0.250 | 0.065 | 100.0 | 
    + DotCom Lookup | 0.049 | 0.121 | 0.182 | 0.040 | 100.0 | 
    ---<-------->---+-------+-------+-------+-------+-------+ 
        bus.domain2.net 
       Some Other Provider, Inc 

等等

這裏就是我想要使用正則表達式來提取之前

數據/ sed的

Domain,Company,IP,Cached Name Min,Cached Name Max,Cached Name Avg,Cached Name Std.Dev,Cached Name Reliab%,IP,Uncached Name Min,Uncached Name Max,Uncached Name Avg,Uncached Name Std.Dev,Uncached Name Reliab%,IP,Cached Name Min,Cached Name Max,Cached Name Avg,Cached Name Std.Dev,Cached Name Reliab%,IP,DotCom Lookup Min,DotCom Lookup Max,DotCom Lookup Avg,DotCom Lookup Std.Dev,DotCom Lookup Reliab% 
sub.domain.com,Some Provider - LLC,1.2.3.4,0.000,0.000,0.000,0.000,100.0,0.040,0.250,0.100,0.065,100.0,0.049,0.182,0.121,0.040,100 
bus.domain2.net,Some Other Provider - Inc,5.6.7.8,0.000,0.000,0.000,0.000,100.0,0.040,0.250,0.100,0.065,100.0,0.040,0.250,0.100,0.065,100.0,0.049,0.182,0.121,0.040,100.0 

是對正則表達式這種使用情況太複雜/ SED?有沒有人有任何線索我會如何實現這一目標?

+0

取決於你如何定義「複雜」。這當然是可行的,但它會導致很長的正則表達式。 – Vallentin

+1

[可以,是的。應該這樣做嗎?絕對不是。](https://regex101.com/r/tdCtpR/1/) – Vallentin

回答

2

對此使用sed可能不是最佳選擇,但有時情況或願望會覆蓋該想法。

所以這裏是一個sed的解決方案:

sed -En "s/^\s*([[:digit:]]+\.)\s*([[:digit:]]+\.)\s*([[:digit:]]+\.)\s*([[:digit:]]+)\s*\|.*$/\1\2\3\4\,/;T;{N;N;N;N;s/\n[^|]+\|//g;s/ \| /,/g;s/ \|//;x;N;z;N;N;s/,/ -/g;G;s/\n\s*/,/g;s/^,//;p}" input.txt 
  • 查找與 「1.2.3.4」 的路線,IP;
    嘗試提取IP,如果失敗嘗試下一行
    s/^\s*([[:digit:]]+\.)\s*([[:digit:]]+\.)\s*([[:digit:]]+\.)\s*([[:digit:]]+)\s*\|.*$/\1\2\3\4\,/;T;
  • 在成功的情況下T;{
  • 獲得接下來的幾行和刪除(或替換「」)很多不必要的東西
    N;N;N;N;s/\n[^|]+\|//g;s/ \| /,/g;s/ \|//;
  • 店,在保持空間和忽略以下行x;N;z;
  • 讓旁邊線和消毒一個「」 N;N;s/,/ -/g;
  • 追加什麼是存儲在按住空格G;
  • 一些化妝得到 「」(只有)在正確的地方s/\n\s*/,/g;s/^,//;
  • 打印,做p}

輸出:

sub.domain.com,Some Provider - LLC,1.2.3.4, 0.000,0.000,0.000,0.000,100.0,0.040,0.100,0.250,0.065,100.0,0.049,0.121,0.182,0.040,100.0 
bus.domain2.net,Some Other Provider - Inc,5.6.7.8, 0.000,0.000,0.000,0.000,100.0,0.040,0.100,0.250,0.065,100.0,0.049,0.121,0.182,0.040,100.0 

非常相似,所需的輸出,除了一些「」在1.2.3.4,之後。
這是一個問題嗎?

(其實我有期望的輸出樣本輸入相匹配的疑惑,你可以仔細檢查?)