awk中 - 從評論

拆分html文件，我有這樣的評論的HTML文件（有些可以被嵌套）awk中 - 從評論

<!-- Begin foo.html --> 
<p>some html code</p> 

    <!-- Begin foo2.html --> 
    <p>some html code</p> 
    <!-- End foo2.html --> 

<!-- End foo.html --> 

<!-- Begin bar.html --> 
<p>some html code</p> 
<!-- End bar.html -->

我想要做分割html文件到foo.html和Foo2。 html和bar.html。塊註釋的數量未知。作爲塊的名稱。到目前爲止，我有這個AWK線

awk '/<!-- Begin (.*?)-->/ {f=$1} f{print > f} /<!-- End \1 -->/{close f; f=""}' index.html

但它不能正常工作。

關於如何解決這個問題或任何其他方法的幫助？

來源

2011-12-23 Thomas Goetz

和whas應該發生在foo2.html？你爲什麼用awk來做到這一點？ – 2011-12-23 04:08:32

對不起，foo2.html也必須拆分。我實際上認爲awk可以完成這項工作。 – 2011-12-23 04:23:18

所以你的意思是foo2.html需要在一個單獨的foo2.html文件中拆分？您需要更新您的問題以添加此詳細信息。 – 2011-12-23 04:38:03

雖然我對這個問題不太清楚。但是如果你有具體的評論，那麼你可以給出一個正則表達式範圍。 foo2.html部分也將附加在foo.html中。事情是這樣的 -

awk ' 
/Begin foo.html/,/End foo.html/{print $0 > "foo.html"} 
/Begin bar.html/,/End bar.html/{print $0 > "bar.html"}' index.html

測試：

[jaypal:~/Temp] cat index.html 
<!-- Begin foo.html --> 
<p>some html code</p> 

    <!-- Begin foo2.html --> 
    <p>some html code</p> 
    <!-- End foo2.html --> 

<!-- End foo.html --> 

<!-- Begin bar.html --> 
<p>some html code</p> 
<!-- End bar.html --> 

[jaypal:~/Temp] awk '/Begin foo.html/,/End foo.html/{print $0 > "foo.html"} 
/Begin bar.html/,/End bar.html/{print $0 > "bar.html"}' index.html 

[jaypal:~/Temp] cat foo.html 
<!-- Begin foo.html --> 
<p>some html code</p> 

    <!-- Begin foo2.html --> 
    <p>some html code</p> 
    <!-- End foo2.html --> 

<!-- End foo.html --> 

[jaypal:~/Temp] cat bar.html 
<!-- Begin bar.html --> 
<p>some html code</p> 
<!-- End bar.html -->

來源

2011-12-23 04:29:21

謝謝，但在我的情況下，我不知道foo.html或foo2.html，這就是爲什麼我用/ <！ - Begin（。*？） - >/ – 2011-12-23 09:19:13

$ cat input.txt 
<!-- Begin foo.html --> 
<p>some html code</p> 

    <!-- Begin foo2.html --> 
    <p>some html code</p> 
    <!-- End foo2.html --> 

<!-- End foo.html --> 

<!-- Begin bar.html --> 
<p>some html code</p> 
<!-- End bar.html --> 

$ awk '/<!-- Begin/{stack[sp++]=$3; print ">>>", $3; next}; /<!-- End/{sp--; print "<<<", $3; next}; {if(sp>0) print > stack[sp-1]}' input.txt 
>>> foo.html 
>>> foo2.html 
<<< foo2.html 
<<< foo.html 
>>> bar.html 
<<< bar.html 

$ for i in {foo,foo2,bar}.html; do echo "=====$i======"; cat $i; done 
=====foo.html====== 
<p>some html code</p> 


=====foo2.html====== 
    <p>some html code</p> 
=====bar.html====== 
<p>some html code</p>

我已經添加debug msg。刪除print ">>>", $3後，代碼非常短。

$ awk '/<!-- Begin/{stack[sp++]=$3; next}; /<!-- End/{sp--; next}; {if(sp>0) print > stack[sp-1]}' input.txt

最後，您應該重新格式化html（縮進不正確）！

來源

2011-12-23 10:40:14 kev

我認爲這是最好的答案OP想要什麼。 – 2011-12-23 16:10:11

awk中 - 從評論

回答

相關問題