與正則表達式找到鏈接

我目前正在嘗試學習Linux命令和正則表達式，我陷入了一個小問題，我試圖使用sed和正則表達式在文件中找到一系列鏈接，任何人都可以幫助我工作這出了什麼地方，我錯了。鏈接是這樣的與正則表達式找到鏈接

<a href="../a-lot-of-different/words-that/should-link.html">Useful links</a> 
<a href="..//a-lot-of-different/words-that/should-find-lots-of-links.html">Multiple links</a> 
<a href="../another-word-and-links/multiple-words/sjshfi-dfg.html">more links</a>

這就是我所擁有的。

sed -n '/<a*href=」^[../"]*\([a-z]*\)^[.html](["]*\)/p' /file > newfile

來源

2014-10-29 knowlage

如果它是一個HTML文件，我建議使用DOM解析器。請參閱http://unix.stackexchange.com/questions/6389/parse-html-on-linux和http://stackoverflow.com/questions/893585/how-to-parse-xml-in-bash – Phil 2014-10-29 23:31:32

正則表達式對解析HTML並不理想。

你沒有顯示你想要的輸出。我猜你想要提取鏈接。如果是這樣，請嘗試：

$ sed -rn 's/.*<a\s+href="([^"]*)".*/\1/p' file 
../a-lot-of-different/words-that/should-link.html 
..//a-lot-of-different/words-that/should-find-lots-of-links.html 
../another-word-and-links/multiple-words/sjshfi-dfg.html

工作原理：

.*<a\s+href="

此鏈接匹配之前的一切。
([^"]*)

此相匹配的鏈接，它捕捉到組\1。
".*

此行和隨後的一切後雙引號匹配。

來源

2014-10-29 23:44:54 John1024

謝謝你的這使得它更加清晰，並且找到了我正在尋找的其中一個鏈接。 – knowlage 2014-10-30 00:37:05

錨標籤包含href標籤，所以搜索href就能解決問題

sed -n '/href=".*"/p' link_file.txt

來源

2014-10-29 23:52:11 Hackaholic

與正則表達式找到鏈接

回答

相關問題