用Python/BS4刮臉

我想用BS4和Python 2.7從http://www.pro-football-reference.com/boxscores/201602070den.htm刮掉「團隊統計」表格。但是林無法得到任何接近它，用Python/BS4刮臉

url = 'http://www.pro-football-reference.com/boxscores/201602070den.htm' 
page = requests.get(url) 
soup = BeautifulSoup(page.text, "html5lib") 
table=soup.findAll('table', {'id':"team_stats", "class":"stats_table"}) 
print table

我覺得像上面的代碼會工作，但沒有運氣。

來源

2016-07-25 Ravash Jalil

你到底要刮？桌子？ –

爲了獲得有效的幫助，您需要提供更多的信息（在您的原始文章中，而不是在可能無法看到的評論中）。不工作：不運行？或者，運行，但給出不正確的結果？你在期待什麼？發生什麼事？還包括任何錯誤消息（如果適用）。此外，看起來像你缺少一些'進口'陳述 – Levon

其裝載的JavaScript ...所以你將需要像ghost.js或硒somertign ... –

這種情況下的問題是「Team Stats」表格位於您使用requests下載的HTML源代碼中的評論內。找到註釋，並用BeautifulSoup重新分析它變成一個「湯」對象：

import requests 
from bs4 import BeautifulSoup, NavigableString 

url = 'http://www.pro-football-reference.com/boxscores/201602070den.htm' 
page = requests.get(url, headers={'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36'}) 

soup = BeautifulSoup(page.content, "html5lib") 
comment = soup.find(text=lambda x: isinstance(x, NavigableString) and "team_stats" in x) 

soup = BeautifulSoup(comment, "html5lib") 
table = soup.find("table", id="team_stats") 
print(table)

和/或可以加載表入裏，例如pandas dataframe這是非常方便與合作：

import pandas as pd 
import requests 
from bs4 import BeautifulSoup 
from bs4 import NavigableString 

url = 'http://www.pro-football-reference.com/boxscores/201602070den.htm' 
page = requests.get(url, headers={'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36'}) 

soup = BeautifulSoup(page.content, "html5lib") 
comment = soup.find(text=lambda x: isinstance(x, NavigableString) and "team_stats" in x) 

df = pd.read_html(comment)[0] 
print(df)

打印：

  Unnamed: 0   DEN   CAR 
0   First Downs    11    21 
1   Rush-Yds-TDs  28-90-1  27-118-1 
2 Cmp-Att-Yd-TD-INT 13-23-141-0-1 18-41-265-0-1 
3   Sacked-Yards   5-37   7-68 
4  Net Pass Yards   104   197 
5   Total Yards   194   315 
6   Fumbles-Lost   3-1   4-3 
7   Turnovers    2    4 
8  Penalties-Yards   6-51   12-102 
9  Third Down Conv.   1-14   3-15 
10 Fourth Down Conv.   0-0   0-0 
11 Time of Possession   27:13   32:47

來源

2016-07-25 18:47:51 alecxe

哇它會花我一段時間瞭解這裏發生了什麼哈哈謝謝。 –

用Python/BS4刮臉

回答

相關問題