使用BeautifulSoup網站刮擦：TypeError：'NoneType'對象無法調用

我是絕對的初學者。我嘗試使用BeautifulSoup並刮掉一個網站。我確實得到了HTML，但是我現在想要獲得所有類別爲content_class的divs。使用BeautifulSoup網站刮擦：TypeError：'NoneType'對象無法調用

這裏是我的嘗試：

import requests 
from BeautifulSoup import BeautifulSoup 

#Request the page and parse the HTML 
url = 'mywebsite' 
response = requests.get(url) 
html = response.content 

#Beautiful Soup 
soup = BeautifulSoup(html) 
soup.find_all('div', class_="content_class")

然而，這並不工作。我得到：

Traceback (most recent call last): File "scrape.py", line 11, in soup.find_all('div', class_="content_class") TypeError: 'NoneType' object is not callable

我在做什麼錯？

來源

2017-07-02 George Welder

如果你在倒數第二行放上'print（soup.find_all）'，打印什麼？ – unutbu

所以我做了'soup = BeautifulSoup（html）'，然後 'print（soup.find_all）'，打印的是'None'。 –

您使用BeautifulSoup version three，但似乎以下BeautifulSoup version four的文檔。 Element.find_all() method僅適用於最新的主要版本（稱爲Element.findAll() in version 3）。

我強烈建議你升級：

pip install beautifulsoup4

和

from bs4 import BeautifulSoup

3版已停止在2012年接收更新;它現在嚴重過時了。

來源

2017-07-02 21:21:02

謝謝，我做到了！但是，現在我得到'導致此警告的代碼位於文件scrape.py的第10行。爲了擺脫這一警告的，變化的代碼看起來像這樣： BeautifulSoup（YOUR_MARKUP}）這樣： BeautifulSoup（YOUR_MARKUP，「html.parser」） MARKUP_TYPE = MARKUP_TYPE））' –

@GeorgeWelder，只需按照警告中的說明進行操作。你也可以簡單地忽略它。 – ForceBru

@GeorgeWelder：是的，BeautifulSoup 4用於爲你自動選擇一個分析後端，但是當你稍後安裝LXML時會導致意想不到的變化。您現在被要求作出明確的選擇：'湯= BeautifulSoup（html，'html.parser'）'或'soup = BeautifulSoup（html，'lxml'）'或'soup = BeautifulSoup（html，'html5lib'）' 。 –

你得到這個錯誤，因爲在BeautifulSoup沒有方法「find_all」，有「的findAll」的方法，此代碼應幫助

soup.findAll('div', {'class': 'content_class'})

來源

2017-07-02 21:17:16

謝謝。我試過了，錯誤消失了，但是我得到了一個空數組：'[]'，但我確定'content_class'類存在於多個div中的文檔中。 –

你真的不應該再使用BeautifulSoup版本3了。它已經維持了5年多。 –

使用BeautifulSoup網站刮擦：TypeError：'NoneType'對象無法調用

回答

相關問題