2011-12-02 23 views
0

我有一些簡單的html頁面Test.html,test2.html,test3.html。該網頁上有一些鏈接到圖片:如何從本地html文件下載圖片?

<img src="http://site.org/path/to/file/6c7f2.jpeg"/> 

如何從自動將此頁面下載的所有圖像,把附近的HTML文件和鏈接的變化在HTML頁面中,以本地圖片?

謝謝!

回答

0

嘗試命令$ wget -F -i <html_file>

這將下載包含在您<html_file>的每一個環節,並把它們在當前目錄。我建議你讀wget的($ man wget)的手動下從其中i提取的如下因素選項部分:

-i文件 --input文件=文件

Read URLs from a local or external file. If - is specified as file, URLs are 
read from the standard input. (Use ./- to read from a file literally named -.) 

    If this function is used, no URLs need be present on the command line. If 
there are URLs both on the command line and in an input file, those on the 
command lines will be the first ones to be retrieved. If --force-html is not 
specified, then file should consist of a series of URLs, one per line. 

    However, if you specify --force-html, the document will be regarded as html. 
In that case you may have problems with relative links, which you can solve 
either by adding "<base href="url">" to the documents or by specifying 
--base=url on the command line. 

    If the file is an external one, the document will be automatically treated as 
html if the Content-Type matches text/html. Furthermore,the file's location 
will be implicitly used as base href if none was specified. 

和選項:

-F --force-HTML

When input is read from a file, force it to be treated as an HTML file. 
This enables you to retrieve relative links from existing HTML files on 
your local disk, by adding "<base href="url">" to HTML, or using the 
--base command-line option. 

此外,我建議您閱讀手冊頁中的--output-file選項。

這隻會處理下載的東西...使你的HTML文件自動更改我認爲你需要其他工具shellscripting要麼不提供或者,如果這樣做,是非常複雜的使用。我建議做在Python腳本中使用了上述命令將下載的東西,和一些Python專業庫來處理(解析)的文件,並進行便捷的變化。

祝你好運!

+0

謝謝!很有幫助。計劃通過sed和sed,basename等其他工具來修改html! – kirill

+0

哇! 'sed'是一個非常強大的工具,但使用起來相當複雜......我還沒有掌握它,但hehehe ...祝你好運! =) – Throoze