無法打開使用美麗的湯庫下載的圖像

我有一個腳本，使用BeautifulSoup庫從網頁下載圖像。當我使用諸如http://www.google.com之類的網站時，圖像會正確下載到桌面上的文件夾中，然後我可以打開並查看它。但是，當我使用諸如https://sites.google.com/site/imagesizetesting/one-1這樣的站點時，圖像顯示爲下載到正確的文件夾桌面，但我收到一條錯誤消息，指出「Paint無法讀取此文件。這不是有效的位圖文件，或者其格式不是當前的支持的。」我認爲這可能與谷歌主頁的html文件中的文件路徑是相對的，它是/images/srpr/logo4w.png，而包含在https://sites.google.com/site/imagesizetesting/one-1中的圖像的路徑不是相對的，它是/ rsrc /1370373631437/one-1/Test.png"> https://sites.google.com/site/imagesizetesting//rsrc/1370373631437/one-1/Test.png。我不知道該如何區分對於圖像源是什麼導致它，或者它是別的東西。任何想法？這裏是我的解析代碼，並下載圖像。無法打開使用美麗的湯庫下載的圖像

for image in soup.findAll("img"): 
     print "Old Image Path: %(src)s" % image 
     #Get file name 
     filename = image["src"].split("/")[-1] 
     #Get full path name if url has to be parsed 
     parsedURL[2] = image["src"] 
     image["src"] = '%s\%s' % (phonepath,filename) 
     print 'New Path: %s' % image["src"] 
     outpath = os.path.join(out, filename) 

     #retrieve images 
     if image["src"].lower().startswith("http"): 
      urlretrieve(image["src"], outpath) 
      print image["src"].lower() 
     else: 
      urlretrieve(urlparse.urlunparse(parsedURL), outpath) #Constructs URL from tuple (parsedURL) 
      print image["src"].lower()

來源

2013-06-05 johns4ta

如果您在瀏覽器中保存圖片並將其存儲在HD中，Paint是否可以打開它？ –

是的，我能夠。 – johns4ta

我想通了。感謝Paulo的幫助。 – johns4ta

我想通了！這裏是萬一有人我更新的代碼肚肚類似的問題。

for image in soup.findAll("img"): 
     print "Old Image Path: %(src)s" % image 
     #Get file name 
     filename = image["src"].split("/")[-1] 
     #Get full path name if url has to be parsed 
     parsedURL[2] = image["src"] 
     image["src"] = '%s\%s' % (phonepath,filename) 
     #Old File path (local to computer) 
     #image["src"] = '%s\%s' % (out,filename) 
     print 'New Path: %s' % image["src"] 
     #  print image 
     outpath = os.path.join(out, filename) 

     #retrieve images 
     if parsedURL[2].lower().startswith("http"): 
      #urlretrieve(image["src"], outpath) 
      urlretrieve(parsedURL[2], outpath) 
      print image["src"].lower() 
     else: 
      print "HTTP INFO " + urlparse.urlunparse(parsedURL) 
      print "HTTP INFO " + image["src"].lower() 
      urlretrieve(urlparse.urlunparse(parsedURL), outpath) #Constructs URL from tuple (parsedURL) 
      #print image["src"].lower()

來源

2013-06-05 19:31:47 johns4ta

無法打開使用美麗的湯庫下載的圖像

回答

相關問題