這種情況下的問題是「Team Stats」表格位於您使用requests
下載的HTML源代碼中的評論內。找到註釋,並用BeautifulSoup
重新分析它變成一個「湯」對象:
import requests
from bs4 import BeautifulSoup, NavigableString
url = 'http://www.pro-football-reference.com/boxscores/201602070den.htm'
page = requests.get(url, headers={'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36'})
soup = BeautifulSoup(page.content, "html5lib")
comment = soup.find(text=lambda x: isinstance(x, NavigableString) and "team_stats" in x)
soup = BeautifulSoup(comment, "html5lib")
table = soup.find("table", id="team_stats")
print(table)
和/或可以加載表入裏,例如pandas
dataframe這是非常方便與合作:
import pandas as pd
import requests
from bs4 import BeautifulSoup
from bs4 import NavigableString
url = 'http://www.pro-football-reference.com/boxscores/201602070den.htm'
page = requests.get(url, headers={'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36'})
soup = BeautifulSoup(page.content, "html5lib")
comment = soup.find(text=lambda x: isinstance(x, NavigableString) and "team_stats" in x)
df = pd.read_html(comment)[0]
print(df)
打印:
Unnamed: 0 DEN CAR
0 First Downs 11 21
1 Rush-Yds-TDs 28-90-1 27-118-1
2 Cmp-Att-Yd-TD-INT 13-23-141-0-1 18-41-265-0-1
3 Sacked-Yards 5-37 7-68
4 Net Pass Yards 104 197
5 Total Yards 194 315
6 Fumbles-Lost 3-1 4-3
7 Turnovers 2 4
8 Penalties-Yards 6-51 12-102
9 Third Down Conv. 1-14 3-15
10 Fourth Down Conv. 0-0 0-0
11 Time of Possession 27:13 32:47
你到底要刮?桌子? –
爲了獲得有效的幫助,您需要提供更多的信息(在您的原始文章中,而不是在可能無法看到的評論中)。不工作:不運行?或者,運行,但給出不正確的結果?你在期待什麼?發生什麼事?還包括任何錯誤消息(如果適用)。此外,看起來像你缺少一些'進口'陳述 – Levon
其裝載的JavaScript ...所以你將需要像ghost.js或硒somertign ... –