從不規則周圍的文本行

我有以下內容的文本文件中提取浮動：從不規則周圍的文本行

[silencedetect @ 0x7fa73cd000c0] silence_start: 1.32515 
[silencedetect @ 0x7fa73cd000c0] silence_end: 1.88188 | silence_duration: 0.556735 
[silencedetect @ 0x7fa73cd000c0] silence_start: 2.99698 
[silencedetect @ 0x7fa73cd000c0] silence_end: 3.42311 | silence_duration: 0.426122 
[silencedetect @ 0x7fa73cd000c0] silence_start: 5.58311 
[silencedetect @ 0x7fa73cd000c0] silence_end: 6.13984 | silence_duration: 0.556735 
[silencedetect @ 0x7fa73cd000c0] silence_start: 7.6729 
size=N/A time=00:00:09.12 bitrate=N/A speed= 675x

，我想提取遵循的價值觀「silence_start：」和「silence_end：」位（即值1.32515,1.88188，...，7.6729）以及「時間=」（即00：00：09.12）之後的值。

我是grep/sed/awk的新手，並試圖學習如何使用其中的一個來做到這一點，但在掙扎之後就沒有發生。我嘗試了各種想法，並在網上查找，但成功仍然讓我無法逃脫。 Python的建議/解決方案也很棒。試了一下，這是一團糟。

任何人都可以請提供任何幫助嗎？我非常感謝它...提前謝謝你！

來源

2017-09-16 mtpadila

到目前爲止，你可以展示你的掙扎嗎？有人可能會幫助解決它們... –

隨着GNU grep和Perl regular expression（-P）：

grep -Po '(silence_start: |silence_end: |time=)\K[0-9:.]+' file

輸出：

 
1.32515 
1.88188 
2.99698 
3.42311 
5.58311 
6.13984 
7.6729 
00:00:09.12

來源

2017-09-16 08:18:27 Cyrus

GNU AWK溶液：

cat tst.awk 
{ s=gensub(/.*(time=|silence_(start|end):)([0-9.:]+).*/, "\\3", "g"); 
    print s 
}

說明正則表達式：

.*        # anything 
(        # group 1 start 
    time=       # matching string "time=" 
    |        # OR 
    silence_(start|end):   # matching string "silence_start: " 
           # or "silence_end: " 
)        # group 1 end 
(        # group 3 start 
    [0-9.:]+      # combination of number, "." and ":" 
)        # group 3 end 
.*        # anything

您可以使用此爲：

$ awk -f tst.awk input.txt 
1.32515 
1.88188 
2.99698 
3.42311 
5.58311 
6.13984 
7.6729 
00:00:09.12

或oneliner：

awk '{s=gensub(/.*(time=|silence_(start|end):)([0-9.:]+).*/, "\\3", "g"); print s}' input.txt

來源

2017-09-16 08:37:28

sed的解決方案：

sed -E 's/.*(silence_(start|end): |time=)([^[:space:]]+).*/\3/' file

\3 - 點到3括號內的捕獲(...)組

輸出：

1.32515 
1.88188 
2.99698 
3.42311 
5.58311 
6.13984 
7.6729 
00:00:09.12

來源

2017-09-16 08:42:41 RomanPerekhrest

當你在輸入有名稱值映射關係，首先創建這些映射的陣列的溶液（例如下面n2v[]），然後讓你用自己的名字訪問值通常被證明是最強大的，最簡單的將來，以提高當你的需求改變：

$ cat tst.awk 
BEGIN { FS="[ =]" } 
{ 
    for (i=1; i<=NF; i++) { 
     sub(/:$/,"",$i) 
     n2v[$i] = $(i+1) 
    } 
    prt("silence_start") 
    prt("silence_end") 
    prt("time") 
} 
function prt(name) { 
    if (name in n2v) { 
     print name, n2v[name] 
     delete n2v[name] 
    } 
} 

$ awk -f tst.awk file 
silence_start 1.32515 
silence_end 1.88188 
silence_start 2.99698 
silence_end 3.42311 
silence_start 5.58311 
silence_end 6.13984 
silence_start 7.6729 
time 00:00:09.12

從打印行中刪除name,如果你只想要的值輸出。

例如，如果你想打印的沉默開始，結束及持續時間的三倍每1線，這將會是微不足道的：

$ cat tst.awk 
BEGIN { FS="[ =]" } 
{ 
    for (i=1; i<=NF; i++) { 
     sub(/:$/,"",$i) 
     n2v[$i] = $(i+1) 
    } 
} 
"silence_end" in n2v { 
    print n2v["silence_start"], n2v["silence_end"], n2v["silence_duration"] 
    delete n2v 
} 
END { print n2v["time"] } 

$ awk -f tst.awk file 
1.32515 1.88188 0.556735 
2.99698 3.42311 0.426122 
5.58311 6.13984 0.556735 
00:00:09.12

以上將與任何AWK工作在任何外殼上的任何UNIX安裝（或Windows，如果您有Windows的awk）。

來源

2017-09-16 16:21:51

從不規則周圍的文本行

回答

相關問題