爲什麼find_all BeautifulSoup4函數什麼都不返回？

新來美麗的湯4，當我在YouTube上搜索某些東西時，我無法獲取這個簡單的代碼來獲取標籤的內容。當我打印容器時，它只是打印「[]」作爲我假設的一個空變量。任何想法，爲什麼這不是挑選什麼？這是否與不在YouTube上抓取正確的標籤？在搜索HTML有一個結果如下標籤：爲什麼find_all BeautifulSoup4函數什麼都不返回？

<a id="video-title" class="yt-simple-endpoint style-scope ytd-video-renderer" aria-label="Kendrick Lamar - HUMBLE. by KendrickLamarVEVO 5 months ago 3 minutes, 4 seconds 322,571,817 views" href="https://www.youtube.com/watch?v=tvTRZJ-4EyI" title="Kendrick Lamar - HUMBLE."> 
       Kendrick Lamar - HUMBLE. 
       </a>

Python代碼：

import bs4 

from urllib.request import urlopen as uReq 
from bs4 import BeautifulSoup as soup 

search = "damn" 
my_url = "https://www.youtube.com/results?search_query=" + search 
uClient = uReq(my_url) 
page_html = uClient.read() 
uClient.close() 

#html parsing 
page_soup = soup(page_html, "html.parser") 

containers = page_soup.find_all("a",{"id":"video-title"}) 
print(containers) 

#result-count

來源

2017-09-23 douglasrcjames

這裏工作很好。你是否檢查過「page_html」包含你所期望的內容？（另外，'page_soup.find（id ='video-title'）'會更簡單。） – Ryan

page_html中似乎沒有'id =「video-title」'的''，如果你想要頁面的結果使用'page_soup.find_all（'a'，{'class'：'yt-uix-sessionlink spf-link'}）'。 – Bijoy

如果檢查url的源代碼，你找不到任何id="video-title"這意味着該頁面動態加載內容。 BeautifulSoup不支持自動加載。嘗試將它與其他東西一樣selenium或scrapyjs結合，也this post可能會有所幫助

來源

2017-09-23 07:18:45 Reza

我有一種感覺，html可能是動態的，感謝參考，我會檢查出來！ – douglasrcjames

在YouTube頁面加載結果動態所以ID和類的名稱將改變。當你試圖使解析頁面務必閱讀頁面的源代碼，當你urllib中加載它不是在瀏覽器看到，代碼是可以解決你的問題：

from bs4 import BeautifulSoup as bs 
from urllib.request import * 
page = urlopen('https://www.youtube.com/results?search_query=damn').read() 
soup = bs(page,'html.parser') 
results = soup.find_all('a',{'class':'yt-uix-sessionlink'}) 
for link in results: 
    print(l.get("href"))

代碼將顯示在頁面中的所有網址，以便您應該解析它也。

來源

2017-09-23 07:52:49

太棒了，我會檢查並報告回來 – douglasrcjames

爲什麼find_all BeautifulSoup4函數什麼都不返回？

回答

相關問題