這是我的第一篇文章。我一直在進行python編程,最近正在開發一個多線程下載器。但問題是我的文件(jpg是我的目標)被損壞。另外隨着followinf輸入:http://www.aumathletics.com/images_web/headerAUMLogo.jpgPython簡單的多線程下載文件損壞
它顯示錯誤
,而與輸入: http://www.nasa.gov/images/content/607800main_kepler1200_1600-1200.jpg
的文件被損壞。
下面是代碼: -
import os, sys, requests
import threading
import urllib2
import time
URL = sys.argv[1]
def buildRange(value, numsplits):
lst = []
for i in range(numsplits):
if i == 0:
lst.append('%s-%s' % (i, int(round(1 + i * value/(numsplits*1.0) + value/(numsplits*1.0)-1, 0))))
else:
lst.append('%s-%s' % (int(round(1 + i * value/(numsplits*1.0),0)), int(round(1 + i * value/(numsplits*1.0) + value/(numsplits*1.0)-1, 0))))
return lst
def main(url=None, splitBy=5):
start_time = time.time()
if not url:
print "Please Enter some url to begin download."
return
fileName = "image.jpg"
sizeInBytes = requests.head(url, headers={'Accept-Encoding': 'identity'}).headers.get('content-length', None)
print "%s bytes to download." % sizeInBytes
if not sizeInBytes:
print "Size cannot be determined."
return
dataDict = {}
# split total num bytes into ranges
ranges = buildRange(int(sizeInBytes), splitBy)
def downloadChunk(idx, irange):
req = urllib2.Request(url)
req.headers['Range'] = 'bytes={}'.format(irange)
dataDict[idx] = urllib2.urlopen(req).read()
# create one downloading thread per chunk
downloaders = [
threading.Thread(
target=downloadChunk,
args=(idx, irange),
)
for idx,irange in enumerate(ranges)
]
# start threads, let run in parallel, wait for all to finish
for th in downloaders:
th.start()
for th in downloaders:
th.join()
print 'done: got {} chunks, total {} bytes'.format(
len(dataDict), sum((
len(chunk) for chunk in dataDict.values()
))
)
print "--- %s seconds ---" % str(time.time() - start_time)
if os.path.exists(fileName):
os.remove(fileName)
# reassemble file in correct order
with open(fileName, 'w') as fh:
for _idx,chunk in sorted(dataDict.iteritems()):
fh.write(chunk)
print "Finished Writing file %s" % fileName
print 'file size {} bytes'.format(os.path.getsize(fileName))
if __name__ == '__main__':
main(URL)
這裏的缺口可能是錯了,所以這裏是代碼引擎收錄(點)的COM/wGEkp878
我會很感激,如果有人能指出錯誤
編輯:由人提議
def buildRange(value, numsplits):
lst = []
for i in range(numsplits):
first = i if i == 0 else buildRange().start(i, value, numsplits)
second = buildRange().end(i, value, numsplits)
lst.append("{}-{}".format(first, second))
return lst
誰能告訴我鋤保持part1 part2等名稱下載的零件文件等
作爲第一個猜它看起來你有更爲複雜,它的buildRange功能應該是,這也可能是你的問題。更重要的是,我很抱歉,這不是你的問題的答案,但多線程下載這樣的下載幾乎肯定會花費更多的時間,而不是在單個請求中進行。原因是儘管您的所有數據都在同一時間下載,但您仍然受限於帶寬,現在您還有許多其他事情正在進行。雖然這是一個很酷的實驗,但絕對值得完成。 –
你能告訴我如何存儲作爲part 1 2 3 4 etc下載的零件文件嗎? – AKM
你原來的構建範圍似乎工作,但新的做得好得多。真正的問題似乎是額外的新行字符被添加!每遇到'\ n',在它之前插入一個額外的0x0D。 –