刮數據點擊一個按鈕

-1

我想從網頁刮數據後：https://www.youtube.com/playlist?list=PLMC9KNkIncKtPzgY-5rmhvj7fax8fdxoj 刮數據點擊一個按鈕

有在頁面的末尾是「負載更多」按鈕，加載更多視頻。

此頁面僅顯示100個視頻，但我想在點擊「加載更多」按鈕後解析數據。

<button class="yt-uix-button yt-uix-button-size-default yt-uix-button-default load-more-button yt-uix-load-more browse-items-load-more-button" type="button" onclick=";return false;" aria-label="Load more 
" data-uix-load-more-target-id="pl-load-more-destination" data-uix-load-more-href="/browse_ajax?action_continuation=1&amp;continuation=4qmFsgIuEiRWTFBMTUM5S05rSW5jS3RQemdZLTVybWh2ajdmYXg4ZmR4b2oaBkNHVSUzRA%253D%253D"><span class="yt-uix-button-content"> <span class="load-more-loading hid"> 
     <span class="yt-spinner"> 
     <span class="yt-spinner-img yt-sprite" title="Loading icon"></span> 

Loading... 
    </span> 

    </span> 
    <span class="load-more-text"> 
    Load more 

    </span> 
</span></button>

我能做到，我用美麗的湯
編輯：找到2個解決方案。一個使用美麗和其他使用硒。

來源

2016-07-14 John Doe

-1

我用下面的代碼來獲取video titles什麼工作，你可以編輯湊的其他內容。

from bs4 import BeautifulSoup 
import json 
import requests 

url = "https://www.youtube.com/playlist?list=PLMC9KNkIncKtPzgY-5rmhvj7fax8fdxoj" 
html=requests.get(url).text 

soup=BeautifulSoup(html, "lxml") 

links=soup.find_all(class_='pl-video-title') 

for vid in links: 
    print vid.contents[1].string 

url1="https://www.youtube.com/browse_ajax?action_continuation=1&continuation=4qmFsgIuEiRWTFBMTUM5S05rSW5jS3RQemdZLTVybWh2ajdmYXg4ZmR4b2oaBkNHVSUzRA%3D%3D" 
html1=requests.get(url1).text 
data=json.loads(html1) 

soup=BeautifulSoup(data[u'content_html'], "lxml") 

links=soup.find_all(class_='pl-video-title') 

for vid in links: 
    print vid.contents[1].string

來源

2016-07-14 05:10:15 shiva

Thnks。我有一個疑問，你爲什麼使用json.loads（html1）？爲什麼它以json格式提供數據？另外，當我在瀏覽器中鍵入該url時，會下載一個名爲browser_ajax的文件。它不應該打開一個網頁 –

當你去的網址，服務器返回一個'json'響應，這是'json.loads（）'的原因。 – shiva

Throat ....................... –

您可以通過調用select（）方法並傳遞您正在查找的元素的CSS選擇器的字符串來從BeautifulSoup對象中檢索網頁元素。

soup.select('span .load-more-text')

我相信這應該爲你正在試圖做

來源

2016-07-14 03:52:50 Carolyn

你沒有理解這個問題。我想在我點擊加載更多按鈕後刮掉這個網頁的內容。 –

-1

閱讀播放列表的最佳方法是使用YouTube API。

但是，如果由於某種原因您無法使用它，您希望在這裏找到一個爬行程序，它也可以與該頁面進行交互。 selenium是一個很好的例子：

from selenium import webdriver 
from selenium.webdriver.common.by import By 
from selenium.webdriver.support import expected_conditions as EC 
from selenium.webdriver.support.ui import WebDriverWait 

driver = webdriver.Firefox() 
driver.get("https://www.youtube.com/playlist?list=PLMC9KNkIncKtPzgY-5rmhvj7fax8fdxoj") # Get the playlist page 

# Click the button 
load_more_button = driver.find_element_by_class_name("load-more-text") 
load_more_button.click() 

# Wait *up to* 10 seconds to make sure the page has finished loading (check that the button no longer exists) 
WebDriverWait(driver,10).until(EC.invisibility_of_element_located(
    (By.CLASS_NAME, "load-more-text"))) 
# Get the html 
html = driver.page_source

從這一點上，你可以解析HTML，你從requests會。

來源

2016-07-14 05:32:23

THnks。當我開始使用硒時，這將在稍後用於我的使用。 –

刮數據點擊一個按鈕

回答

相關問題