Python簡單的多線程下載文件損壞

這是我的第一篇文章。我一直在進行python編程，最近正在開發一個多線程下載器。但問題是我的文件（jpg是我的目標）被損壞。另外隨着followinf輸入：http://www.aumathletics.com/images_web/headerAUMLogo.jpg Python簡單的多線程下載文件損壞

它顯示錯誤

，而與輸入： http://www.nasa.gov/images/content/607800main_kepler1200_1600-1200.jpg

的文件被損壞。

下面是代碼： -

import os, sys, requests 
import threading 
import urllib2 
import time 

URL = sys.argv[1] 

def buildRange(value, numsplits): 
    lst = [] 
    for i in range(numsplits): 
    if i == 0: 
     lst.append('%s-%s' % (i, int(round(1 + i * value/(numsplits*1.0) + value/(numsplits*1.0)-1, 0)))) 
    else: 
     lst.append('%s-%s' % (int(round(1 + i * value/(numsplits*1.0),0)), int(round(1 + i * value/(numsplits*1.0) + value/(numsplits*1.0)-1, 0)))) 
return lst 

def main(url=None, splitBy=5): 
    start_time = time.time() 
    if not url: 
     print "Please Enter some url to begin download." 
     return 

fileName = "image.jpg" 
sizeInBytes = requests.head(url, headers={'Accept-Encoding': 'identity'}).headers.get('content-length', None) 
print "%s bytes to download." % sizeInBytes 
if not sizeInBytes: 
    print "Size cannot be determined." 
    return 

dataDict = {} 

# split total num bytes into ranges 
ranges = buildRange(int(sizeInBytes), splitBy) 

def downloadChunk(idx, irange): 
    req = urllib2.Request(url) 
    req.headers['Range'] = 'bytes={}'.format(irange) 
    dataDict[idx] = urllib2.urlopen(req).read() 

# create one downloading thread per chunk 
downloaders = [ 
    threading.Thread(
     target=downloadChunk, 
     args=(idx, irange), 
    ) 
    for idx,irange in enumerate(ranges) 
    ] 

# start threads, let run in parallel, wait for all to finish 
for th in downloaders: 
    th.start() 
for th in downloaders: 
    th.join() 



print 'done: got {} chunks, total {} bytes'.format(
    len(dataDict), sum((
     len(chunk) for chunk in dataDict.values() 
    )) 
) 

print "--- %s seconds ---" % str(time.time() - start_time) 

if os.path.exists(fileName): 
    os.remove(fileName) 



# reassemble file in correct order 
with open(fileName, 'w') as fh: 

    for _idx,chunk in sorted(dataDict.iteritems()): 
     fh.write(chunk) 

print "Finished Writing file %s" % fileName 
print 'file size {} bytes'.format(os.path.getsize(fileName)) 

if __name__ == '__main__': 
    main(URL)

這裏的缺口可能是錯了，所以這裏是代碼引擎收錄（點）的COM/wGEkp878

我會很感激，如果有人能指出錯誤

編輯：由人提議

def buildRange(value, numsplits): 
    lst = [] 
    for i in range(numsplits): 
     first = i if i == 0 else buildRange().start(i, value, numsplits) 
     second = buildRange().end(i, value, numsplits) 
     lst.append("{}-{}".format(first, second)) 
    return lst

誰能告訴我鋤保持part1 part2等名稱下載的零件文件等

來源

2015-09-14 AKM

作爲第一個猜它看起來你有更爲複雜，它的buildRange功能應該是，這也可能是你的問題。更重要的是，我很抱歉，這不是你的問題的答案，但多線程下載這樣的下載幾乎肯定會花費更多的時間，而不是在單個請求中進行。原因是儘管您的所有數據都在同一時間下載，但您仍然受限於帶寬，現在您還有許多其他事情正在進行。雖然這是一個很酷的實驗，但絕對值得完成。 –

你能告訴我如何存儲作爲part 1 2 3 4 etc下載的零件文件嗎？ – AKM

你原來的構建範圍似乎工作，但新的做得好得多。真正的問題似乎是額外的新行字符被添加！每遇到'\ n'，在它之前插入一個額外的0x0D。 –

事實證明，文件必須以二進制模式打開，'wb'而不是'w'。如果用'w'打開，會寫入一堆額外的字符。這與derpy windows與linux新行語義有關。如果你使用'wb'，它會準確地寫入你放入文件的內容。

編輯：如果要存儲個人文件部分可以改變

# reassemble file in correct order 
with open(fileName, 'w') as fh: 
    for _idx,chunk in sorted(dataDict.iteritems()): 
     fh.write(chunk) 

print "Finished Writing file %s" % fileName 
print 'file size {} bytes'.format(os.path.getsize(fileName))

要

# reassemble file in correct order 
for _idx,chunk in sorted(dataDict.iteritems()): 
    with open(fileName + str(".part-") + str(_idx), 'wb') as fh: 
     fh.write(chunk) 

print "Finished Writing file %s" % fileName 
#print 'file size {} bytes'.format(os.path.getsize(fileName))

來源

2015-09-14 05:25:23

仍然沒有得到正確的文件 102331字節下載。做到：有5塊，共102331個字節 ---1.9279999733秒--- 寫完文件image.jpg的文件大小102948個字節 – AKM

我試圖在你提到的幾個文件，它肯定寫入正確的信息到文件。從字面上看，所有需要更改的行是，其中open（fileName，'w'）爲fh：至與open（fileName，'wb'）爲fh： –

謝謝..可能你還告訴我，我怎麼能保持部分文件太 – AKM

Python簡單的多線程下載文件損壞

回答

相關問題