一直使用美麗的湯來遍歷頁面,但無論出於什麼原因,我無法讓循環超出第一頁。它似乎應該很容易,因爲它是一個文本字符串,但它似乎循環回來,也許它是我的結構而不是我的文本字符串?循環卡在第一頁
這是我有:
import csv
import urllib2
from bs4 import BeautifulSoup
f = open('nhlstats.csv', "w")
groups=['points', 'shooting', 'goaltending', 'defensive', 'timeonice', 'faceoffs', 'minor-penalties', 'major-penalties']
year = ["2016", "2015","2014","2013","2012"]
for yr in year:
for gr in groups:
url = "http://www.espn.com/nhl/statistics/player/_/stat/points/year/"+str(yr)
#www.espn.com/nhl/statistics/player/_/stat/points/year/2014/
page = urllib2.urlopen(url)
soup=BeautifulSoup(page, "html.parser")
pagecount = soup.findAll(attrs= {"class":"page-numbers"})[0].string
pageliteral = int(pagecount[5:])
for i in range(0,pageliteral):
number = int(((i*40) + 1))
URL = "http://www.espn.com/nhl/statistics/player/_/stat/points/sort/points/year/"+str(yr) + "/count/"+str(number)
page = urllib2.urlopen(url)
soup=BeautifulSoup(page, "html.parser")
for tr in soup.select("#my-players-table tr[class*=player]"):
row =[]
for ob in range(1,15):
player_info = tr('td')[ob].get_text(strip=True)
row.append(player_info)
f.write(str(yr) +","+",".join(row) + "\n")
f.close()
這個遍地得到相同的前40條記錄。
我嘗試使用this solution彷彿一個並沒有發現這樣做
prevLink = soup.select('a[rel="nofollow"]')[0]
newurl = "http:" + prevLink.get('href')
沒有更好的工作,但我不知道怎麼做循環,它前進的方式?可能只是累了,但我的循環仍然只是去下一組記錄和卡住那之一。請幫我解決我的環
UPDATE
我的格式是在複製粘貼丟失,我實際的代碼如下所示:
import csv
import urllib2
from bs4 import BeautifulSoup
f = open('nhlstats.csv', "w")
groups=['points', 'shooting', 'goaltending', 'defensive', 'timeonice', 'faceoffs', 'minor-penalties', 'major-penalties']
year = ["2016", "2015","2014","2013","2012"]
for yr in year:
for gr in groups:
url = "http://www.espn.com/nhl/statistics/player/_/stat/points/year/"+str(yr)
#www.espn.com/nhl/statistics/player/_/stat/points/year/2014/
page = urllib2.urlopen(url)
soup=BeautifulSoup(page, "html.parser")
pagecount = soup.findAll(attrs= {"class":"page-numbers"})[0].string
pageliteral = int(pagecount[5:])
for i in range(0,pageliteral):
number = int(((i*40) + 1))
URL = "http://www.espn.com/nhl/statistics/player/_/stat/points/sort/points/year/"+str(yr) + "/count/"+str(number)
page = urllib2.urlopen(url)
soup=BeautifulSoup(page, "html.parser")
for tr in soup.select("#my-players-table tr[class*=player]"):
row =[]
for ob in range(1,15):
player_info = tr('td')[ob].get_text(strip=True)
row.append(player_info)
f.write(str(yr) +","+",".join(row) + "\n")
f.close()
我看不出你如何得到第一個頁面都沒有。這正是你的代碼看起來的樣子嗎? –
人們評論的縮進錯誤實際上只是從我的複製粘貼丟失格式化到stackoverflow ...更新與實際格式。 – ike