使用python進行網頁抓取以提取數據

我正在使用以下代碼。除了「隸屬關係」部分，一切都有效。它返回一個錯誤： AttributeError的：「NoneType」對象有沒有屬性「文本」沒有的.text，它返回的一切 - 整個代碼的類裏面使用python進行網頁抓取以提取數據

import requests 
import bs4 
import re 

headers = {'User-Agent':'Mozilla/5.0'} 

url = 'http://pubs.acs.org/toc/jacsat/139/5' 
html = requests.get(url, headers=headers) 

soup = bs4.BeautifulSoup(html.text, 'lxml') 

tags = soup.findAll('a', href=re.compile("full")) 

for tag in tags: 
    new_url = tag.get('href', None) 
    newurl = 'http://pubs.acs.org' + new_url 
    newhtml = requests.get(newurl, headers=headers) 
    newsoup = bs4.BeautifulSoup(newhtml.text, 'lxml') 

    article_title = newsoup.find(class_="articleTitle").text 
    print(article_title) 

    affiliations = newsoup.find(class_="affiliations").text 
    print(affiliations) 

    authors = newsoup.find(id="authors").text 
    print(authors) 

    citation_year = newsoup.find(class_="citation_year").text 
    print(citation_year) 

    citation_volume = newsoup.find(class_="citation_volume").text 
    print(citation_volume) 

    citation = newsoup.find(id="citation").text 
    print(citation) 

    pubdate = newsoup.find(id="pubDate").text 
    print(pubdate)

來源

2017-02-10 wus

這個異常被觸發，因爲它沒有找到具有「隸屬關係」類的任何元素。我已經檢查並找不到任何元素在源HTML中的這個類值（或任何其他屬性），您的腳本在第一個URL中擦除。

我會捕獲錯誤，以避免您的腳本中斷並返回無或默認字符串時，它找不到該元素。

類似的東西會工作：

try: 
    affiliations = newsoup.find(class_="affiliations").text 
    print(affiliations) 
except AttributeError: 
    affiliations = None

來源

2017-02-12 18:23:30

使用python進行網頁抓取以提取數據

回答

相關問題