2012-07-19 39 views
6

我需要找到一個HTML代碼類似於此的圖像:Python的下載圖像

... 
<a href="/example/1"> 
    <img id="img" src="http://example.net/example.jpg" alt="Example" /> 
</a> 
... 

我使用LXML和要求。

下面是代碼:

import lxml 
from lxml import html 
import requests 

url = 'http://www.example.com' 

r = requests.get(url) 
tree = lxml.html.fromstring(r.content) 

img = tree.get_element_by_id("img") 
f = open("image.jpg",'wb') 
f.write(requests.get(img['src']).content) 

但我得到一個錯誤:

Traceback (most recent call last): 
    File "/Users/Name/Documents/Python/Example/Script.py", line 13, in <module> 
    s = requests.get(img['src']) 
    File "/Library/Python/2.6/site-packages/lxml/lxml.etree.pyx", line 1052, in lxml.etree._Element.__getitem__ (src/lxml/lxml.etree.c:38272) 
TypeError: 'str' object cannot be interpreted as an index 

建議?

+2

建議:閱讀文檔,並請固定的HTML。 – dav1d 2012-07-19 18:24:34

回答

4

嘗試f.write(requests.get(img.attrib['src']).content)

1
import lxml.html 
import requests 

url = 'http://www.example.com/' 
tree = lxml.html.parse(url) 
img = tree.get_element_by_id('img') 
img_url = img.attrib['src'] 

with open('image.jpg', 'wb') as outf: 
    data = requests.get(img_url).content 
    outf.write(data) 
+0

IMG = tree.get_element_by_id( 'IMG') 它不工作這一次,它說: 回溯(最近通話最後一個): 文件「/Users/Example/Documents/Python/Example/Script.py 」,第6行,在 IMG = tree.get_element_by_id( 'IMG') AttributeError的: 'lxml.etree._ElementTree' 對象沒有屬性 'get_element_by_id' 我試圖替代樹= lxml.html.parse(URL )與 tree = lxml.html.fromstring(requests.get(url).content) 現在它的工作原理,感謝您的幫助! – Jiloc 2012-07-20 00:26:57