Python - 檢索文章是否有作者

-2

我正在嘗試編寫一個Python腳本來檢索文章是否有作者。Python - 檢索文章是否有作者

我寫了下面：

s = "https://www.nytimes.com/2017/08/18/us/politics/steve-bannon-trump-white-house.html?hp&action=click&pgtype=Homepage&clickSource=story-heading&module=a-lede-package-region&region=top-news&WT.nav=top-news" 

def checkForAuthor(): 
    r = requests.get(s) 
    return "By" in r.text 

print(checkForAuthor())

的問題是，函數返回checkForAuthor即使true時，有沒有作者，因爲它搜索單詞整個HTML內容。找到作者而不搜索整個文檔有更好的邏輯嗎？比如在標題內搜索，所以我甚至不需要搜索文章內容。我確實需要製作這個通用的搜索引擎，以便我搜索到的任何網站都能給出結果。不確定那裏有什麼東西。

來源

2017-08-19 Kobbi Gal

你應該有一些適當的庫解析HTML和檢查只有標籤喲你對此感興趣。 –

從網頁抓取數據的關鍵部分是查看網頁的HTML源代碼以正確獲取數據。在您提供的鏈接中，有以下幾行包含作者信息。

<meta name="author" content="Maggie Haberman, Michael D. Shear and Glenn Thrush" /> 
<meta name="byl" content="By MAGGIE HABERMAN, MICHAEL D. SHEAR and GLENN THRUSH" /> 
<meta property="article:author" content="https://www.nytimes.com/by/maggie-haberman" /> 
<meta property="article:author" content="https://www.nytimes.com/by/michael-d-shear" /> 
<meta property="article:author" content="https://www.nytimes.com/by/glenn-thrush" />

還有其他人，但這些應該有所幫助。要解析這些標籤，您可以使用。

來源

2017-08-19 11:41:44 TrigonaMinima

要解析html並查找所需的數據，應該使用BeautifulSoup庫。

在您的網站的HTML，有一個meta標籤與作者：

<meta content="By MAGGIE HABERMAN, MICHAEL D. SHEAR and GLENN THRUSH" name="byl"/>

因此，要檢查是否有一個作家，你需要它的名字（byl）找到它：

import requests 
from bs4 import BeautifulSoup 

s = "https://www.nytimes.com/2017/08/18/us/politics/steve-bannon-trump-white-house.html?hp&action=click&pgtype=Homepage&clickSource=story-heading&module=a-lede-package-region&region=top-news&WT.nav=top-news" 

def checkForAuthor(): 
    soup = BeautifulSoup(requests.get(s).content, 'html.parser') 
    meta = soup.find('meta', {'name': 'byl'}) 
    return meta is not None

其實，你也可以得到作者的名字與meta["content"]

來源

2017-08-19 11:57:57 Ricardo

Python - 檢索文章是否有作者

回答

相關問題