是否可以使用wget鏡像來保存整個網站的所有鏈接並將它們保存在txt文件中?鏡像整個網站並保存txt文件中的鏈接
如果可能,它是如何完成的?如果沒有,是否有其他方法可以做到這一點?
編輯:
我試圖運行這個命令:
wget -r --spider example.com
,得到了這樣的結果:
Spider mode enabled. Check if remote file exists.
--2015-10-03 21:11:54-- http://example.com/
Resolving example.com... 93.184.216.34, 2606:2800:220:1:248:1893:25c8:1946
Connecting to example.com|93.184.216.34|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1270 (1.2K) [text/html]
Remote file exists and could contain links to other resources -- retrieving.
--2015-10-03 21:11:54-- http://example.com/
Reusing existing connection to example.com:80.
HTTP request sent, awaiting response... 200 OK
Length: 1270 (1.2K) [text/html]
Saving to: 'example.com/index.html'
100%[=====================================================================================================>] 1,270 --.-K/s in 0s
2015-10-03 21:11:54 (93.2 MB/s) - 'example.com/index.html' saved [1270/1270]
Removing example.com/index.html.
Found no broken links.
FINISHED --2015-10-03 21:11:54--
Total wall clock time: 0.3s
Downloaded: 1 files, 1.2K in 0s (93.2 MB/s)
(Yes, I also tried using other websites with more internal links)
是的,這是它應該如何工作。實際網站「example.com」沒有內部鏈接,所以它只是返回自己。嘗試一個網站鏈接到網站內的其他網頁,你應該得到更多。你是否也想要鏈接到* external *網站?如果是這樣,來自@Randomazer的python腳本可能是一個更好的選擇。 – seumasmac
其實,有一個類似的問題,你可以在:http://stackoverflow.com/questions/2804467/spider-a-website-and-return-urls-only哪些可能是有用的。 – seumasmac
非常感謝!這有幫助! – user1878980