從短劃線文件中刪除不重要的文本

它的構造，如：

Text 
Text 
Text 

<--!Important Text begins here--> 
important Text 
Important Text 
Important Text 

<--!Important Text ends here --> 

Unimportant Text 
.... 

<--!Important Text begins here--> 
important Text 
Important Text 
Important Text 

<--!Important Text ends here --> 

Unimportant Text 
....<--!Important Text begins here--> 
important Text 
Important Text 
Important Text 

<--!Important Text ends here --> 

Unimportant Text 
....

等。

我怎樣才能把重要的部分，並將其保存在一個新的文件？我使用破折號終端從Macintosh的

來源

2014-02-25 user3352472

這是一個HTML或XML文件？ –

它的一個txt文件，它包含html代碼 – user3352472

@ user3352472這些'< - ！重要文本在這裏結束 - >標記是否存在？ –

如果您希望包括標記，那麼你可以這樣做：

awk '/<--!Important Text begins here-->/,/<--!Important Text ends here -->/' file

如果您希望忽略的標記和剛打印出來的內容，你可以這樣做：

awk ' 
/<--!Important Text begins here-->/{p=1; next} 
/<--!Important Text ends here -->/{p=0} 
p' file

第一個解決方案是regex範圍。它告訴awk打印範圍（含）之間的所有內容。要忽略標記，只需設置和取消設置標記。

來源

2014-02-25 20:34:22

感謝編輯@ mklement0。 '：）' –

不客氣;很好的使用'awk'，順便說一句。 – mklement0

我試了兩次，並指示輸出到一個文件，但文件是空的。 – user3352472

嘗試以下操作：

sed -n '/<--!Important Text begins here-->/,/<--!Important Text ends here -->/ p' \ 
    infile | 
    fgrep -v -e '<--!Important Text begins here-->' \ 
      -e '<--!Important Text ends here -->' \ 
    > outfile

注：假設所有<--!Important Text ...標記是在每一個單獨的行。

來源

2014-02-25 20:04:26 mklement0

fgrep：-e：沒有這樣的文件或目錄 fgrep：<！ - 豐富網頁結尾：標記爲網址 - >：沒有這樣的文件或目錄 fgrep：：沒有這樣的文件或目錄 sed：RE錯誤：非法字節序列 – user3352472

@ user3352472：當您運行'fgrep --version'時，告訴我您獲得的內容 - 當您運行時，als運行'sed --version'和'awk - version';我不清楚你的意思是「破折號終端」 - 如果用「破折號」表示你的意思是[破折號]（http://en.wikipedia.org/wiki/Debian_Almquist_shell），那將是令人驚訝的;那就是說，這裏不是_shell_，而是你正在運行的_command-line tools_的版本。 – mklement0

從短劃線文件中刪除不重要的文本

回答

相關問題