使用請求和BeautifulSoup刮取重定向的網站

我使用requests和BeautifulSoup4刮掉NBA網站。當它進入了一個瀏覽器，我認爲使用requests是這個模擬的正確方法使用請求和BeautifulSoup刮取重定向的網站

from bs4 import BeautifulSoup 
import requests 

r = requests.get('http://www.nba.com/games/20111225/BOSNYK/boxscore.html') 
soup = BeautifulSoup(r.text)

的URL網站實際上導致「http://www.nba.com/games/20111225/BOSNYK/gameinfo.html#nbaGIboxscore」。

問題是我不知道這種效果的關鍵字，並且無法在線找到解決方案。

來源

2017-03-26 Paul Deng

您可以使用regex或bs4爲了找到重定向的網站，然後使用requests爲了刮他。

例如：

import bs4 
import requests 

original_url = 'http://www.nba.com/games/20111225/BOSNYK/' 
old_suffix = 'boxscore.html' 
r = requests.get(original_url + old_suffix) 
site_content = bs4.BeautifulSoup(r.text, 'lxml') 
meta = site_content.find_all('meta')[0] 
meta_content = meta.attrs.get('content') 
new_suffix = meta.attrs.get('content')[6:] 
new_url_to_scrape = original_url + new_suffix

然後刮new_url_to_scarpe。享受！

來源

2017-03-26 14:20:57

不錯，我現在看到！ –

使用請求和BeautifulSoup刮取重定向的網站

回答

相關問題