2
我試圖處理此頁:BeautifulSoap不解析的div類
https://play.google.com/store/movies/details?id=3B6EBBD94D13B4DCMV
我用下面的代碼讀取HTML:
from BeautifulSoup import BeautifulSoup as BS
import requests
def read_html(url):
try:
res = requests.get(url)
if res.status_code == 200:
html_content = res.content
soup = BS(html_content)
return _get_type(soup)
else:
print res.status_code
except ValueError, e:
print e
def _get_type(soup):
"""Read Movie."""
mydivs = soup.findAll("span", {"class": "DBzzzb"})
if mydivs:
return 'AVAILABLE'
mydivs = soup.findAll("span", {"class": "DBzzzb"})
if mydivs:
return 'PREORDER'
mydivs = soup.findAll("div", {"class": "Wc4pU"})
if mydivs:
return 'NOT_AVAILABLE'
return 'INVALID'
我的條件永遠不匹配:soup.findAll("div", {"class": "Wc4pU"}
即使有實際上是在HTML代碼中有:
<div class="Wc4pU">We'll notify you on your wishlist when movies become available</div>
來源HTML:
view-source:https://play.google.com/store/movies/details?id=3B6EBBD94D13B4DCMV
有什麼建議嗎?
您應該使用'bs4' –
更改爲BS4工作! – spicyramen