添加時間戳到與urllib.urlretrieve下載的文件

我正在使用urllib.urlretrieve下載文件，我想添加一些東西來檢查下載前的變化。我已經有類似如下：添加時間戳到與urllib.urlretrieve下載的文件

import urllib 

urllib.urlretrieve("http://www.site1.com/file.txt", r"output/file1.txt") 
urllib.urlretrieve("http://www.site2.com/file.txt", r"output/file2.txt")

理想我想劇本改動檢查（比較上次修改的郵票？），如果相同的忽略和下載是否有更新的，我需要的腳本來添加時間戳到文件名。

任何人都可以幫忙嗎？

我是新手編程（python是我的第一個），所以任何批評歡迎！

來源

2013-05-20 user2401842

在文件名中的時間戳最簡單的方法是：

import time 
'output/file_%d.txt' % time.time()

易讀這樣：

from datetime import datetime 
n = datetime.now() 
n.strftime('output/file_%Y%m%d_%H%M%S.txt')

來源

2013-05-20 13:50:56

-1問題是如何確定資源是否在服務器上發生了變化。 –

我的問題並不完全清楚，但提及時間戳文件名稱是 – user2401842

這輸出紀元時間，任何想法如何使其標準（人類可讀）時間/日期？ – user2401842

-1

urllib.urlretrieve()已經這樣做了你。如果輸出文件名存在，它將執行所有必要的檢查以避免再次下載。

但是，只有服務器支持它才能使用。因此，您可能需要打印HTTP標頭（函數調用的第二個結果）以查看是否可以完成緩存。

而且這篇文章可能會有所幫助：http://pymotw.com/2/urllib/

它具有這樣的代碼接近尾聲：

import urllib 
import os 

def reporthook(blocks_read, block_size, total_size): 
    if not blocks_read: 
     print 'Connection opened' 
     return 
    if total_size < 0: 
     # Unknown size 
     print 'Read %d blocks' % blocks_read 
    else: 
     amount_read = blocks_read * block_size 
     print 'Read %d blocks, or %d/%d' % (blocks_read, amount_read, total_size) 
    return 

try: 
    filename, msg = urllib.urlretrieve('http://blog.doughellmann.com/', reporthook=reporthook) 
    print 
    print 'File:', filename 
    print 'Headers:' 
    print msg 
    print 'File exists before cleanup:', os.path.exists(filename) 

finally: 
    urllib.urlcleanup() 

    print 'File still exists:', os.path.exists(filename)

此下載文件，顯示進度和打印頭。使用它來調試您的場景，找出爲什麼緩存不能按預期工作。

來源

2013-05-20 13:54:11

嗨亞倫，我的urllib.urlretrieve實現保持覆蓋文件，即使文件名是相同的。有什麼我需要做的來調用這個功能？ – user2401842

當你說「覆蓋」，那麼你看到它下載塊？ –

您是否有證據表明urlretrieve執行此操作？我的/usr/lib/python2.7/urllib.py中的檢索函數definitly doesnt。永遠不要查看Last-Modified標題，永遠不要統計文件以獲得時間，能夠發送if-modified-since標題，從而能夠使用304響應。它使用的唯一標題是Content-Length - 確認下載與預期大小相匹配。只是盲目地打開URL然後寫入文件 - 不關心它是否已經存在。通過查看網絡服務器日誌以及代碼（在我的例子中，我控制雙方）確認 – barryhunter

添加時間戳到與urllib.urlretrieve下載的文件

回答

相關問題