刮表數據到數據幀

一個例子網址爲「http://www.hockey-reference.com/players/c/crosbsi01/gamelog/2016」刮表數據到數據幀

我試圖抓住被命名爲常規賽的表名。

我用以前的情況下，需要做的就是這樣的事情...

import requests 
from bs4 import * 
from bs4 import NavigableString 
import pandas as pd 


url = 'http://www.hockey-reference.com/players/o/ovechal01/gamelog/2016' 
resultsPage = requests.get(url) 
soup = BeautifulSoup(resultsPage.text, "html5lib") 
comment = soup.find(text=lambda x: isinstance(x, NavigableString) and "Regular Season Table" in x) 
df = pd.read_html(comment)

這是我走上類似這樣一個網站的方式的類型，但是，我無法找到表格正確與此頁面。不知道我錯過了什麼。

來源

2016-10-20 Ravash Jalil

有一個表，你可以使用id：

import requests 
from bs4 import BeautifulSoup 


url = 'http://www.hockey-reference.com/players/o/ovechal01/gamelog/2016' 
resultsPage = requests.get(url) 
soup = BeautifulSoup(resultsPage.text, "html5lib") 
table = soup.select_one("#gamelog") 
print(table)

或僅使用大熊貓：

df = pd.read_html(url, attrs = {'id': 'gamelog'})

，你正在尋找一個NavigableString你的代碼永遠無法工作，這在標題標籤內<caption>Regular Season Table</caption>不是表格，您需要致電* .find_previous` *獲取表格：

comment = soup.find(text=lambda x: isinstance(x, NavigableString) and "Regular Season Table" in x) 
table = comment.find_previous("table")

您也可以使用table = comment.parent.parent，但find_previous是更好的方法。

來源

2016-10-20 19:40:02

刮表數據到數據幀

回答

相關問題