2016-11-19 58 views
0

假設我們有一行文本存儲在一個文件:Linux中稍後提取模式字符串和其他模式字符串的簡短方法是什麼?

// In the actual file this will be one line 
{unrelated_text1,ID:13, unrelated_text2,TIMESTAMP:1476280500,unrelated_text3}, 
{other_unrelated_text1,other_unrelated_text2,ID:25,TIMESTAMP:1476280600}, 
{ID:30,more_unrelated_text1,TIMESTAMP:1476280700}, 
{ID:40,final_unrelated_text} 

我要的是這個特定的輸入提取3項

// The details, such as whether to put { character in front or not do not matter. 
// Any form of output which extracts only these 3 entries and groups them in a 
// visually nice way will do the job. 
{ID:13, TIMESTAMP:1476280500} 
{ID:25, TIMESTAMP:1476280600} 
{ID:30, TIMESTAMP:1476280700} 
// I do not want the last entry, because it does not contain timestamp field. 

到目前爲止最接近的命令我發現是

grep -Po {ID:[0-9]+(.+?)} input_file 

它給出輸出

{unrelated_text1,ID:13,unrelated_text2,TIMESTAMP:1476280500,unrelated_text3} 
{other_unrelated_text1,other_unrelated_text2,ID:25,TIMESTAMP:1476280600} 
{ID:30,more_unrelated_text1,TIMESTAMP:1476280700} 
{ID:40,final_unrelated_text} 

下次改進我正在尋找的是如何從每個條目中刪除unrelated_text,並刪除最後一個條目。

問題:在Linux中最簡單的方法是什麼?

回答

1

隨着GNU AWK多焦RS和RT和單詞邊界:

$ awk -v RS='\\<(ID|TIMESTAMP):[0-9]+' 'NR%2{id=RT;next} RT{printf "{%s, %s}\n", id, RT}' file 
{ID:13, TIMESTAMP:1476280500} 
{ID:25, TIMESTAMP:1476280600} 
{ID:30, TIMESTAMP:1476280700} 

以上將工作不管輸入是在一行或多行,也不管你有什麼其他的文本該文件所依賴的是在每個相關TIMESTAMP之前出現的ID,並且在必要時不難更改。

相關問題