我有一個鏈接列表,每個鏈接都包含多個頁面。我發現每個子類別中的頁面數量,但現在我想進行循環以遍歷子鏈接的所有頁面。因此,鏈接的第一類將有8頁,第二個鏈接將有6頁等等。將列表引用到for循環中
lists = [8, 6, 5, 13, 10, 16, 13, 15, 4, 4, 5, 7, 2, 6, 6, 8, 9, 8, 3, 8, 8, 1, 6, 3, 2, 15, 5, 4, 2, 12, 18, 5, 2]
import bs4 as bs
import urllib.request
import pandas as pd
import urllib.parse
import re
#source = urllib.request.urlopen('https://messageboards.webmd.com/').read()
source = urllib.request.urlopen('https://messageboards.webmd.com').read()
soup = bs.BeautifulSoup(source,'lxml')
df = pd.DataFrame(columns = ['link'],data=[url.a.get('href') for url in soup.find_all('div',class_="link")])
lists =[]
lists2=[]
lists3=[]
page_links = []
for i in range(0,33):
link = (df.link.iloc[i])
req = urllib.request.Request(link)
resp = urllib.request.urlopen(req)
respData = resp.read()
temp1=re.findall(r'Filter by</span>(.*?)data-pagedcontenturl',str(respData))
temp1=re.findall(r'data-totalitems=(.*?)data-pagekey',str(temp1))[0]
pageunm=round(int(re.sub("[^0-9]","",temp1))/10)
lists.append(pageunm)
for j in lists:
for y in range(1, j+1):
url_pages = link + '#pi157388622=' + str(j)
page_links.append(url_pages)
所以每個'j'要遍歷'範圍內(1,J + 1)'? – jonrsharpe