我是新來的Python和捆綁使用刮從這個URL表BeautifulSoup:http://www.espn.com/college-sports/basketball/recruiting/databaseresults?firstname=&lastname=&class=2007&starsfilter=GT&stars=0&ratingfilter=GT&rating=&positionrank=&sportid=4294967265&collegeid=&conference=&visitmonth=&visityear=&statuscommit=Commitments&statusuncommit=Uncommited&honor=®ion=&state=&height=&weight=網頁抓取兩個HTML文本和圖像鏈接使用Python Beautifulsoup
到目前爲止,我已經找到了如何拉表每個玩家行的數據,以及每行中學校徽標的鏈接。但是,我無法將這兩者結合起來。我想爲每個球員(下面代碼中的player_data
)以及他們相應的學校徽標圖片鏈接(logo_links
)提取表格數據,然後在保存的CSV文件中爲每個球員排成一行。
以下是我到目前爲止。先謝謝您的幫助。
#! python3
# downloadRecruits.py - Downloads espn college basketball recruiting database info
import requests, os, bs4, csv
import pandas as pd
# Starting url (class of 2007)
url = 'http://www.espn.com/college-sports/basketball/recruiting/databaseresults?firstname=&lastname=&class=2007&starsfilter=GT&stars=0&ratingfilter=GT&rating=&positionrank=&sportid=4294967265&collegeid=&conference=&visitmonth=&visityear=&statuscommit=Commitments&statusuncommit=Uncommited&honor=®ion=&state=&height=&weight='
# Download the page
print('Downloading page %s...' % url)
res = requests.get(url)
res.raise_for_status()
# Creating bs object
soup = bs4.BeautifulSoup(res.text, "html.parser")
# Get the data
data_rows = soup.findAll('tr')[1:]
type(data_rows)
player_data = [[td.getText() for td in data_rows[i].findAll('td')] for i in range(len(data_rows))]
logo_links = [a['href'] for div in soup.find_all("div", attrs={"class": "school-logo"}) for a in div.find_all('a')]
# Saving only player_data
with open('recruits2.csv', 'w') as f_output:
csv_output = csv.writer(f_output)
csv_output.writerows(player_data)
是'list(zip(player_data,logo_links))'你在這裏想要什麼? –
@ViníciusAguiar很好地排列了兩個列表,但我希望'logo_links'成爲'player_data'列表的一部分。按照您的建議壓縮列表後,當我將其導出爲CSV時,所有'player_data'都在一列中,然後'logo_links'在第二列中:https://d1ax1i5f2y3x71.cloudfront.net/items/2817413i333G1A3k1N44/Image% 202017-08-12%20AT%203.18.15%20 PM.png?X-CloudApp-訪客-ID = 2746470。我的理想輸出是一個CSV,其中一列與每個現有表格的列匹配。 – NateRattner