2014-10-07 223 views
-1

我有一個我正在使用的電子商務網站的產品的csv文件,以及FTP訪問每個產品對應的圖像(〜15K產品) 。使用CSV文件名從FTP或HTTP下載文件 - Python 3

我想使用Python從csv中只列出從FTP或HTTP列出的圖像,並將它們保存在本地。

import urllib.request 
import urllib.parse 
import re 

url='http://www.fakesite.com/pimages/filename.jpg' 

split = urllib.parse.urlsplit(url) 
filename = split.path.split("/")[-1] 
urllib.request.urlretrieve(url, filename) 

print(filename) 

saveFile = open(filename,'r') 
saveFile.close() 

import csv 

with open('test.csv') as csvfile: 
    readCSV = csv.reader(csvfile, delimiter=",") 

    images = [] 

    for row in readCSV: 
     image = row[14] 

print(image) 

我目前的代碼可以從URL中提取文件名並將文件另存爲該文件名。它也可以從CSV文件中提取圖像的文件名。 (文件名和圖像完全相同)我需要它做的是輸入文件名,從CSV到URL的末尾,然後將該文件保存爲文件名。

我已經畢業這樣的:

import urllib.request 
import urllib.parse 
import re 
import os 
import csv 

with open('test.csv') as csvfile: 
    readCSV = csv.reader(csvfile, delimiter=",") 

    images = [] 

    for row in readCSV: 
     image = row[14] 

     images.append(image) 


x ='http://www.fakesite.com/pimages/' 

url = os.path.join (x,image) 

split = urllib.parse.urlsplit(url) 
filename = split.path.split("/")[-1] 
urllib.request.urlretrieve(url,filename) 



saveFile = open(filename,'r') 
saveFile.close() 

現在,這是很大的。它完美的作品。它將正確的文件名從CSV文件中提取出來,並將其添加到URL的末尾,下載文件並將其保存爲文件名。

但是,我似乎無法弄清楚如何使這項工作的CSV文件的多行。到目前爲止,它需要最後一行,並提取相關信息。理想情況下,我會將CSV文件與其上的所有產品一起使用,它會通過並下載每一個,而不僅僅是最後一張圖像。

回答

0

你正在做奇怪的事情......

import urllib.request 
import csv 

# the images list should be outside the with block 
images = [] 
IMAGE_COLUMN = 14 

with open('test.csv') as csvfile: 
    # read csv 
    readCSV = csv.reader(csvfile, delimiter=",") 
    for row in readCSV: 
     # I guess 14 is the column-index of the image-name like 'image.jpg' 
     # I've put it in some constant 

     # now append all the image-names into the list 
     images.append(row[IMAGE_COLUMN]) 

     # no need for the following 
     # image = row[14] 
     # images.append(image) 

# make sure, root_url ends with a slash 
# x was some strange name for an url 
root_url = 'http://www.fakesite.com/pimages/' 

# iterate through the list 
for image in images: 
    # you don't need os.path.join, because that's operating system dependent. 
    # you don't need to urlsplit, because you have created the url yourself. 
    # you don't need to split the filename as it is the image name 
    # with the following line, the root_url must end with a slash 
    url = root_url + image 

    # urlretrieve saves the file as whatever image is into the current directory 
    urllib.request.urlretrieve(url, image) 

或短,這就是你需要:

import urllib.request 
import csv 

IMAGE_COLUMN = 14 
ROOT_URL = 'http://www.fakesite.com/pimages/' 
images = [] 

with open('test.csv') as csvfile: 
    readCSV = csv.reader(csvfile, delimiter=",") 
    for row in readCSV: 
     images.append(row[IMAGE_COLUMN]) 

for image in images: 
    url = ROOT_URL + image 
    urllib.request.urlretrieve(url, image) 
+0

哇。這工作很好。我爲做一些奇怪的事情而道歉,我對Python很陌生。謝謝您的幫助。 – Jakoby 2014-10-08 17:15:30

+0

這提出了我的下一個問題,我已經在下面發佈。 – Jakoby 2014-10-09 21:34:00