0
我正在嘗試下載此鍛鍊的工作表,所有鍛鍊都在不同的日子進行拆分。所有需要做的是在鏈接的末尾添加一個新號碼。這是我的代碼。在解析時遍歷列表
import urllib
import urllib.request
from bs4 import BeautifulSoup
import re
import os
theurl = "http://www.muscleandfitness.com/workouts/workout-routines/gain-10-pounds-muscle-4-weeks-1?day="
urls = []
count = 1
while count <29:
urls.append(theurl + str(count))
count +=1
print(urls)
for url in urls:
thepage = urllib
thepage = urllib.request.urlopen(urls)
soup = BeautifulSoup(thepage,"html.parser")
init_data = open('/Users/paribaker/Desktop/scrapping/workout/4weekdata.txt', 'a')
workout = []
for data_all in soup.findAll('div',{'class':"b-workout-program-day-exercises"}):
try:
for item in data_all.findAll('div',{'class':"b-workout-part--item"}):
for desc in item.findAll('div', {'class':"b-workout-part--description"}):
workout.append(desc.find('h4',{'class':"b-workout-part--exercise-count"}).text.strip("\n") +",\t")
workout.append(desc.find('strong',{'class':"b-workout-part--promo-title"}).text +",\t")
workout.append(desc.find('span',{'class':"b-workout-part--equipment"}).text +",\t")
for instr in item.findAll('div', {'class':"b-workout-part--instructions"}):
workout.append(instr.find('div',{'class':"b-workout-part--instructions--item workouts-sets"}).text.strip("\n") +",\t")
workout.append(instr.find('div',{'class':"b-workout-part--instructions--item workouts-reps"}).text.strip("\n") +",\t")
workout.append(instr.find('div',{'class':"b-workout-part--instructions--item workouts-rest"}).text.strip("\n"))
workout.append("\n*3")
except AttributeError:
pass
init_data.write("".join(map(lambda x:str(x), workout)))
init_data.close
的問題是,服務器超時,我假設它不是通過列表迭代正常或添加我不需要文字和崩潰的服務器解析器。 我也試着編寫另一個腳本來抓取所有鏈接並將它們放在文本文檔中,然後重新打開此腳本中的文本並遍歷文本,但這同樣給了我同樣的錯誤。你怎麼看?
所以我有意使用URLs-urls是我在第7行使用theurl + count創建的新列表。 –
哦!是的,我明白了,讓我試試看。 –
完美的工作非常感謝額外的眼睛! –