我試圖通過多個頁面循環來使用Python和Beautifulsoup刮擦數據。我的腳本適用於一個頁面,但是當嘗試遍歷多個頁面時,它只會返回最後一個頁面上的數據。我認爲在我循環或存儲/追加player_data
列表的方式中可能有問題。使用Python刮擦多個頁面Beautifulsoup - 只從最後一頁返回數據
這是我迄今爲止 - 任何幫助,非常感謝。
#! python3
# downloadRecruits.py - Downloads espn college basketball recruiting database info
import requests, os, bs4, csv
import pandas as pd
# Starting url (class of 2007)
base_url = 'http://www.espn.com/college-sports/basketball/recruiting/databaseresults/_/class/2007/page/'
# Number of pages to scrape (Not inclusive, so number + 1)
pages = map(str, range(1,3))
# url for starting page
url = base_url + pages[0]
for n in pages:
# Create url
url = base_url + n
# Parse data using BS
print('Downloading page %s...' % url)
res = requests.get(url)
res.raise_for_status()
# Creating bs object
soup = bs4.BeautifulSoup(res.text, "html.parser")
table = soup.find('table')
# Get the data
data_rows = soup.findAll('tr')[1:]
player_data = []
for tr in data_rows:
tdata = []
for td in tr:
tdata.append(td.getText())
if td.div and td.div['class'][0] == 'school-logo':
tdata.append(td.div.a['href'])
player_data.append(tdata)
print(player_data)
在'print(player_data)'前加4個空格' – PRMoureu