wget的 - 如何跳過未找到文件？

我用wget從互聯網上下載一個文件，並使用-O選項保存圖像自定義文件名。有時，找不到該文件，返回404錯誤代碼。例如，我運行以下命令：wget的 - 如何跳過未找到文件？

wget 'http://www.example.com/path/to/image/file01928.jpg' -O myimagefile.jpg

結果是

[email protected]:~# wget 'http://www.example.com/path/to/image/file01928.jpg' -O myimagefile.jpg 
--2015-09-13 23:11:07-- http://www.example.com/path/to/image/file01928.jpg 
Resolving www.example.com (www.example.com)... 93.184.216.34, 2606:2800:220:1:248:1893:25c8:1946 
Connecting to www.example.com (www.example.com)|93.184.216.34|:80... connected. 
HTTP request sent, awaiting response... 404 Not Found 
2015-09-13 23:11:07 ERROR 404: Not Found.

儘管文件中沒有找到，但該文件仍然保存在我的硬盤：

[email protected]:~# ls 
myimagefile.jpg

有一種跳過/取消（不執行命令）未找到文件的方法？我應該使用哪些選項？

來源

2015-09-14 user3195859

你想wget的不若運行本地文件存在？ – xxfelixxx

@xxfelixxx我的意思是，我有需要下載數百個圖像的URL列表。此列表是在4個月前創建的。幾個這些圖像的URL不存在了（已過期域，圖像文件被刪除等）。我不想下載「未找到」的文件，只有有效的文件/ URL應該被下載。有沒有辦法做到這一點？ – user3195859

您可以執行HEAD請求以查看資源（圖像）是否存在，如果存在，則下載它。你可以用-S運行wget的打印頭，並--spider檢查，但不能下載的資源。

從man wget

-S 
    --server-response 
     Print the headers sent by HTTP servers and responses sent by FTP servers. 

    --spider 
     When invoked with this option, Wget will behave as a Web spider, which means that 
     it will not download the pages, just check that they are there. For example, you 
     can use Wget to check your bookmarks: 

       wget --spider --force-html -i bookmarks.html 

     This feature needs much more work for Wget to get close to the functionality of 
     real web spiders.

下面是一個例子：

#!/bin/bash 

URL='http://www.google.com' 
echo "Checking $URL" 
if wget -S --spider $URL 2>&1 | grep -q 'Remote file exists'; then 
    echo "Found $URL, going to fetch it" 
    wget $URL -O google.html; 
else 
    echo 'Url $URL does not exist!' 
fi 

URL='http://www.example.com/path/to/image/file01928.jpg' 
echo "Checking $URL" 
if wget -S --spider $URL 2>&1 | grep -q 'Remote file exists'; then 
    echo "Found $URL, going to fetch it" 
    wget $URL -O myimagefile.jpg; 
else 
    echo "Url $URL does not exist!" 
fi

輸出

Checking http://www.google.com 
Found http://www.google.com, going to fetch it 
--2015-09-14 05:26:34-- http://www.google.com/ 
Resolving www.google.com (www.google.com)... 74.125.239.144, 74.125.239.145, 74.125.239.146, ... 
Connecting to www.google.com (www.google.com)|74.125.239.144|:80... connected. 
HTTP request sent, awaiting response... 200 OK 
Length: unspecified [text/html] 
Saving to: ‘google.html’ 

    [ <=>             ] 18,684  --.-K/s in 0.001s 

2015-09-14 05:26:34 (13.9 MB/s) - ‘google.html’ saved [18684] 

Checking http://www.example.com/path/to/image/file01928.jpg 
Url http://www.example.com/path/to/image/file01928.jpg does not exist!

來源

2015-09-14 09:28:26 xxfelixxx

wget的 - 如何跳過未找到文件？

回答

相關問題