1
我有以下代碼,我想出去一個網頁,並將所有相關的漫畫從網站上,並將它們存儲在我的電腦上。第一張圖片的下載效果很好,但似乎存在循環轉到網頁上的前幾頁的問題。如果任何人都可以看看代碼和幫助,將不勝感激。 我得到的錯誤是:卡在網絡抓取代碼
'Traceback (most recent call last):
File "C:\Users\528000\Desktop\kids print\Comic-gather.py", line 41, in <module
>
prevLink = soup.select('a[class="prevLink"]')[0]
'IndexError: list index out of range
'import requests, os, bs4
url = 'http://darklegacycomics.com'
os.makedirs('darklegacy', exist_ok=True)
while not url.endswith('#'):
# Download the page.
print('Downloading page %s...' % url)
res = requests.get(url)
res.raise_for_status()
soup = bs4.BeautifulSoup(res.text)
comicElem = soup.select('.comic img')
if comicElem == []:
print('Could not find comic image.')
else:
try:
comicUrl ='http://darklegacycomics.com' + comicElem[0].get('src')
# Download the image.
print('Downloading image %s...' % (comicUrl))
res = requests.get(comicUrl)
res.raise_for_status()
except requests.exceptions.MissingSchema:
# skip this comic
prevLink = soup.select('.prevlink')[0]
url = 'http://darklegacycomics.com' + prevLink.get('href')
continue
# Save the image to ./darklegacy.
imageFile = open(os.path.join('darklegacy', os.path.basename(comicUrl)), 'wb')
for chunk in res.iter_content(100000):
imageFile.write(chunk)
imageFile.close()
# Get the Prev button's url.
prevLink = soup.select('a[class="prevLink"]')[0]
url = 'http://darklegacycomics.com' + prevLink.get('href')''