刮HTML標題頭和匹配單詞表 - Python的3

我是比較新的Python的，我有問題是：刮HTML標題頭和匹配單詞表 - Python的3

我要指定一個網站，並有一個Python模塊颳去（如BeautifulSoup。）標題標題和打印「賓果」，如果它匹配單詞表中的任何單詞，否則打印「沒有什麼在這裏」

我的代碼如下，有關如何使這項工作的任何建議或想法？

import urllib.request 
from bs4 import BeautifulSoup 

Match = ("Whois", "domain", "IP", "search") 

soup = BeautifulSoup(urllib.request.Request("https://whois.domaintools.com/")) 
if (soup.title.string in Match): 
    print ("Bingo") 
else: 
    print ("Nothing here!")

來源

2017-05-26 veccct

使用「的requests模塊：

import requests 
from bs4 import BeautifulSoup 

r = requests.get('https://whois.domaintools.com/') 

soup = BeautifulSoup(r.text, 'html.parser') 
print(r.text)

這將打印以下消息：

Please contact [email protected] and reference error #4311

我有一個偷渡懷疑，這可能是因爲他們阻止刮刀。事實上，當我們指定一個類似於瀏覽器的用戶代理時，它現在正確地加載頁面。因此，固定版本變爲：

import requests 
from bs4 import BeautifulSoup 

Match = ("Whois", "domain", "IP", "search") 

headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'} 
r = requests.get('https://whois.domaintools.com/', headers=headers) 

soup = BeautifulSoup(r.text, 'html.parser') 

for m in Match: 
    if m in soup.title.string: 
     print('Bingo!') 
     break # Exit checking loop

來源

2017-05-26 01:02:17

非常感謝你，UserAgent的障礙會讓我陷入困境。非常感激。 – veccct

刮HTML標題頭和匹配單詞表 - Python的3

回答

相關問題