從單個頁面網站獲取與BeautifulSoup的所有鏈接（'加載更多'功能）

我想刮掉沒有分頁的網站的所有鏈接，即有一個'LOAD MORE'按鈕，但URL不會更改取決於您要求的數據量。從單個頁面網站獲取與BeautifulSoup的所有鏈接（'加載更多'功能）

當我BeautifulSoup頁面，並要求所有的鏈接，它只是顯示網站的香草第一頁上的鏈接的數量。我可以通過點擊'LOAD MORE'按鈕來手動點擊舊內容，但是有一種方法可以通過編程來完成。

這就是我的意思是：

page = urllib2.urlopen('http://www.thedailybeast.com/politics.html') 
soup = soup = BeautifulSoup(page) 

for link in soup.find_all('a'): 
    print link.get('href')

而且不幸的是有沒有網址，負責分頁。

來源

2016-03-07 Zlo

當您點擊「加載更多」按鈕時，XHR請求發佈到http://www.thedailybeast.com/politics.view.<page_number>.json端點。你需要在你的代碼中模擬它並解析JSON響應。使用requests工作示例：

import requests 

with requests.Session() as session: 
    for page in range(1, 10): 
     print("Page number #%s" % page) 
     response = session.get("http://www.thedailybeast.com/politics.view.%s.json" % page) 
     data = response.json() 

     for article in data["stream"]: 
      print(article["title"])

打印：

Page number #1 
The Two Americas Behind Donald Trump and Bernie Sanders 
... 
Hillary Clinton’s Star-Studded NYC Bash: Katy Perry, Jamie Foxx, and More Toast the Candidate 
Why Do These Republicans Hate Maya Angelou’s Post Office? 
Page number #2 
No, Joe Biden Is Not a Supreme Court Hypocrite 
PC Hysteria Claims Another Professor 
WHY BLACK CELEB ENDORSEMENTS MATTER MOST 
... 
Inside Trump’s Make Believe Presidential Addresses 
...

來源

2016-03-07 16:52:09 alecxe

謝謝你，非常有幫助！ – Zlo

從單個頁面網站獲取與BeautifulSoup的所有鏈接（'加載更多'功能）

回答

相關問題