從頁面上的按鈕上刮下鏈接

我想從這個page上的「箱子得分」按鈕上刮下鏈接。該按鈕應該是這個樣子從頁面上的按鈕上刮下鏈接

http://www.espn.com/nfl/boxscore?gameId=400874795

我試圖用這個代碼，看看我是否能訪問按鈕，但我不能。

from bs4 import BeautifulSoup 
import requests 

url = 'http://www.espn.com/nfl/scoreboard/_/year/2016/seasontype/1/week/2' 

advanced = url 
r = requests.get(advanced) 
data = r.text 
soup = BeautifulSoup(data,"html.parser") 

for link in soup.find_all('a'): 
    print link

來源

2017-08-02 jhaywoo8

1）下載並檢查頁面的原始HTML; 2）找到你想要刮的元素; 3）編寫Python代碼搜索這些元素; 4）??? 5）利潤！ – ForceBru

這裏的問題在於，您從網址獲取的html實際上並不是您在瀏覽器中查看時看到的頁面。有很多Ajax調用來填充頁面，所以當您發出初始請求時，該數據還沒有存在 – wpercy

這裏是我所做的解決方案，它會刪除您在答案中提供的url上的所有鏈接。你可以檢查出來

# from BeautifulSoup import * 
from bs4 import BeautifulSoup 
# import requests 
import urllib 
url = 'http://www.espn.com/nfl/scoreboard/_/year/2016/seasontype/1/week/2' 

# advanced = url 
html = urllib.urlopen(url).read() 
# r = requests.get(html) 
# data = r.text 
soup = BeautifulSoup(html) 

tags = soup('a') 

# for link in soup.find_all('a'): 
for i,tag in enumerate(tags): 
    # print tag; 
    print i; 
    ans = tag.get('href',None) 
    print ans; 
    print "\n";

來源

2017-08-02 18:15:34

這並沒有從「box score」按鈕中獲得鏈接。那是我需要的 – jhaywoo8

由於wpercy提到了他的意見，你不能做到這一點使用requests，作爲一個建議，你應該Chromedriver/PhantomJSselenium一起使用，用於處理JavaScript的：

所有得分按鈕的a標籤具有屬性name = &lpos=nfl:scoreboard:boxscore，所以我們先用.findAll現在一個簡單的列表理解可以提取每個href屬性：

>>> links = [box['href'] for box in boxList] 
>>> links 
['/nfl/boxscore?gameId=400874795', '/nfl/boxscore?gameId=400874854', '/nfl/boxscore?gameId=400874753', '/nfl/boxscore?gameId=400874757', '/nfl/boxscore?gameId=400874772', '/nfl/boxscore?gameId=400874777', '/nfl/boxscore?gameId=400874767', '/nfl/boxscore?gameId=400874812', '/nfl/boxscore?gameId=400874761', '/nfl/boxscore?gameId=400874764', '/nfl/boxscore?gameId=400874781', '/nfl/boxscore?gameId=400874796', '/nfl/boxscore?gameId=400874750', '/nfl/boxscore?gameId=400873867', '/nfl/boxscore?gameId=400874775', '/nfl/boxscore?gameId=400874798']

來源

2017-08-02 23:31:24

從頁面上的按鈕上刮下鏈接

回答

相關問題