如何從目錄中的html文件中提取圖像？

這是這個問題的後續：How do I parse every html file in a directory for images? 本質上，我有一個html文件的目錄，其中每個包含圖像，我想單獨保存在同一個目錄中。如何從目錄中的html文件中提取圖像？

使該程序所建議的修改之後，我仍然得到一個錯誤：

Image: theme/pfeil_grau.gif 

Traceback (most recent call last): 
File "C:\Users\gokalraina\Desktop\modfile.py", line 25, in <module> 
    im = Image.open(image) 
File "C:\Python27\lib\site-packages\PIL\Image.py", line 1956, in open 
prefix = fp.read(16) 
TypeError: 'NoneType' object is not callable

這是我使用修改後的代碼（感謝nightcracker）。

import os, os.path 
import Image 
from BeautifulSoup import BeautifulSoup as bs 

    path = 'C:\Users\gokalraina\Desktop\derm images' 

for root, dirs, files in os.walk(path): 
    for f in files: 
     soup = bs(open(os.path.join(root, f)).read()) 
     for image in soup.findAll("img"): 
     print "Image: %(src)s" % image 
     im = Image.open(image) 
     im.save(path+image["src"], "JPEG")

來源

2012-03-07 Wandering Sophist

請包括整個輸出到這一點，其中包括一個用於'打印「圖片：％s的」％image'位。 – 2012-03-08 03:16:36

我已添加它;在打破之前只有一張照片。 – 2012-03-08 04:15:33

代碼正在將BeautifulSoup.Tag對象傳遞給Image.open，但Image.open正在等待路徑或文件對象。你可以用image["src"]獲取到圖像中的相對路徑，所以代碼將是：

im = Image.open(image["src"])

然而，這條道路是寫在HTML文件相同的路徑，這可能是從HTML文件的開始的相對路徑目錄。如果是這樣，加盟root到image["src"]將得到的絕對路徑爲每個圖像：

im = Image.open(os.path.join(root, image["src"]))

來源

2012-03-08 06:27:05 Devourant

Image.open()可能無法處理遠程文件。您最好使用urllib或urllib2模塊下載圖像。

來源

2012-03-08 03:21:06 Dikei

html文件位於硬盤驅動器的目錄中，而不是直接來自Internet。 – 2012-03-08 03:50:10

然後可能是它無法處理文件URI，請嘗試首先將鏈接轉換爲本地路徑：http://stackoverflow.com/questions/5977576/is-there-a-convenient-way-to-map-a-file -uri-os-path – Dikei 2012-03-08 04:09:59

什麼是文件URI，以及需要轉換的鏈接？ – 2012-03-08 04:15:05

如何從目錄中的html文件中提取圖像？

回答

相關問題