我試圖從這個特定網頁webscrape統計:https://www.sports-reference.com/cfb/schools/louisville/2016/gamelog/的訪問評論HTML線,BeautifulSoup
然而,出現了「防守日誌」表被註釋掉當我在看的HTML源代碼(因此,當試圖使用BeautifulSoup4時,以下代碼只抓取在防禦性數據被註釋掉時未被註釋掉的冒犯性數據。
from urllib.request import Request,urlopen
from bs4 import BeautifulSoup
import re
accessurl = 'https://www.sports-reference.com/cfb/schools/oklahoma-state/2016/gamelog/'
req = Request(accessurl)
link = urlopen(req)
soup = BeautifulSoup(link.read(), "lxml")
tables = soup.find_all(['th', 'tr'])
my_table = tables[0]
rows = my_table.findChildren(['tr'])
for row in rows:
cells = row.findChildren('td')
for cell in cells:
value = cell.string
print(value)
我很好奇,如果有任何解決方案,能夠將所有的防禦值的添加到列表中以同樣的方式在進攻數據存儲無論是內部還是BeautifulSoup4之外。謝謝!
注意,我加入到解決方案如下來源於here:
data = []
table = defensive_log
table_body = table.find('tbody')
rows = table_body.find_all('tr')
for row in rows:
cols = row.find_all('td')
cols = [ele.text.strip() for ele in cols]
data.append([ele for ele in cols if ele]) # Get rid of empty values
你是什麼意思的「註釋」嗎? – snapcrack