爲什麼我的代碼返回IndexError：列表索引超出範圍？

-7

from bs4 import BeautifulSoup 
import urllib2 
import urllib 
import os 
url=urllib.urlopen("https://www.google.co.in/search?q=cow&biw=1242&bih=606&source=lnms&tbm=isch&sa=X&ved=0ahUKEwi21oLAqqzKAhXNjo4KHVs0DkgQ_AUIBigB") 
soup=BeautifulSoup(url) 
li=soup.find_all('a') 
for links in li: 
    imgUrl=links.get('href') 
    sp1=imgUrl.split('imgurl=')[1] 
    sp2=sp1.split('&amp')[0] 
    urllib.urlretrieve(sp2)

我想從此網頁下載所有圖像。我下載的鏈接是谷歌圖片網頁的源代碼。當代碼針對一個圖像單獨執行，但使用find_all下載多個圖像時，它可以正常工作，但會出錯。爲什麼我的代碼返回IndexError：列表索引超出範圍？

來源

2016-01-15 anonymous

在問一個問題之前，你需要做一些調試。嘗試打印出你從imgUrl.split（'imgurl ='）'返回的內容，如果這是失敗的行。如果失敗的行是第二行，則輸出'sp1.split（'＆amp'）'的結果。 –

在你的代碼下面的問題需要注意：

1）無所有imgUrl的包含 'imgUrl的='

2）否所有imgUrl的包含 '&放大器'

3）imgUrl的可能無效（例如，「的javascript：無效（0）」）

考慮到上述因素，我做了一些修改代碼：

from bs4 import BeautifulSoup 
import urllib2 
import urllib 
import os 
url=urllib.urlopen("https://www.google.co.in/search?q=cow&biw=1242&bih=606&source=lnms&tbm=isch&sa=X&ved=0ahUKEwi21oLAqqzKAhXNjo4KHVs0DkgQ_AUIBigB") 
soup=BeautifulSoup(url) 
li=soup.findAll('a', href=True) 
for links in li: 
    imgUrl=links.get('href') 
    if 'imgurl=' in imgUrl: 
     imgUrl=imgUrl.split('imgurl=')[1] 
    if '&amp' in imgUrl: 
     imgUrl=imgUrl.split('&amp')[0] 
    try: 
     urllib.urlretrieve(imgUrl) 
    except: 
     continue # invalid imgUrl

來源

2016-01-15 22:27:59 Quinn

爲什麼我的代碼返回IndexError：列表索引超出範圍？

回答

相關問題