sed單行 - 查找分隔符對周圍的關鍵字

我通常使用大型XML文件，並且通常通過grep進行字數統計以確認某些統計信息。sed單行 - 查找分隔符對周圍的關鍵字

例如，我要確保我通過有widget至少五個實例在一個XML文件：

cat test.xml | grep -ic widget

此外，我只是想能夠登錄該行widget出現在，即：

cat test.xml | grep -i widget > ~/log.txt

不過，我真正需要的關鍵信息的XML代碼widget出現在塊示例文件可能看起來像：

<test> blah blah 
    blah blah blah 
    widget 
    blah blah blah 
</test> 

<formula> 
    blah 
    <details> 
    widget 
    </details> 
</formula>

我試圖讓從以上示例文本下面的輸出，即：

<test>widget</test> 

<formula>widget</formula>

實際上，我試圖讓使用標記的最高水平，適用於一個單一的線包含任意字符串的XML文本/代碼塊，widget。

有沒有人有任何建議通過命令行一個班輪來實現這一點？

謝謝。

來源

2012-07-20 DevNull

看看[此帖]（http://stackoverflow.com /問題/ 2222150 /提取數據的- - 從-A-簡單的XML文件）。也許你有一些想法。 – mtk 2012-07-20 23:19:42

同時使用sed和awk非優雅的方式：

sed -ne '/[Ww][Ii][Dd][Gg][Ee][Tt]/,/^<\// {//p}' file.txt | awk 'NR%2==1 { sub(/^[ \t]+/, ""); search = $0 } NR%2==0 { end = $0; sub(/^<\//, "<"); printf "%s%s%s\n", $0, search, end }'

結果：

<test>widget</test> 
<formula>widget</formula>

說明：

## The sed pipe: 

sed -ne '/[Ww][Ii][Dd][Gg][Ee][Tt]/,/^<\// {//p}' 
## This finds the widget pattern, ignoring case, then finds the last, 
## highest level markup tag (these must match the start of the line) 
## Ultimately, this prints two lines for each pattern match 

## Now the awk pipe: 

NR%2==1 { sub(/^[ \t]+/, ""); search = $0 } 
## This takes the first line (the widget pattern) and removes leading 
## whitespace, saving the pattern in 'search' 

NR%2==0 { end = $0; sub(/^<\//, "<"); printf "%s%s%s\n", $0, search, end } 
## This finds the next line (which is even), and stores the markup tag in 'end' 
## We then remove the slash from this tag and print it, the widget pattern, and 
## the saved markup tag

HTH

來源

2012-07-20 23:56:51 Steve

sed -nr '/^(<[^>]*>).*/{s//\1/;h};/widget/{g;p}' test.xml

個

打印

<test> 
<formula>

桑達只有當打印你想要的確切格式一行程序會比較複雜。

編輯：
你可以使用/widget/I代替/widget/用於在GNU sed的widget不區分大小寫匹配，否則使用[Ww]爲每個字母作爲對方的回答。

來源

2012-07-21 05:17:57 nshy

這可能會爲你工作（GUN SED）：

sed '/^<[^/]/!d;:a;/^<\([^>]*>\).*<\/\1/!{$!N;ba};/^<\([^>]*>\).*\(widget\).*<\/\1/s//<\1\2<\/\1/p;d' file

來源

2012-07-21 08:40:43 potong

需求gawk有正則表達式中RS

BEGIN { 
    # make a stream of words 
    RS="(\n|)" 
} 

# match </tag> 
/<\// { 
    s-- 
    next 
} 

# match <tag> 
/</ { 
    if (!s) { 
    tag=substr($0, 2) 
    } 
    s++ 
} 

$0=="widget" { 
    print "<" tag $0 "</" tag 
}

來源

2012-07-27 18:41:35 slitvinov

sed單行 - 查找分隔符對周圍的關鍵字

回答

相關問題