2010-04-09 26 views
1

我想弄清楚如何去編寫一個網站監控腳本(cron工作到底)打開一個給定的URL,檢查是否存在標籤,並檢查如果標籤不存在,或者不包含期望的數據,則可以將一些內容寫入日誌文件或發送電子郵件。Python腳本來檢查網站的標籤

該標籤將是類似的東西或東西比較類似。

任何人有任何想法?

回答

5

你最好打賭imo是退房BeautifulSoup。像這樣的東西:

import urllib2 
from BeautifulSoup import BeautifulSoup 

page = urllib2.urlopen("http://yoursite.com") 
soup = BeautifulSoup(page) 

# See the docs on how to search through the soup. I'm not sure what 
# you're looking for so my example stops here :) 

之後,發送電子郵件或記錄它是非常標準的票價。

1

以下(未經測試)代碼使用urllib2來抓取頁面並重新搜索它。

import urllib2,StringIO 

pageString = urllib2.urlopen('**insert url here**').read() 
m = re.search(r'**insert regex for the tag you want to find here**',pageString) 
if m == None: 
    #take action for NOT found here 
else: 
    #take action for found here 

以下(未經測試)代碼使用pycurl和StringIO來抓取頁面並重新搜索它。

import pycurl,re,StringIO 

b = StringIO.StringIO() 
c = pycurl.Curl() 
c.setopt(pycurl.URL, '**insert url here**') 
c.setopt(pycurl.WRITEFUNCTION, b.write) 
c.perform() 
c.close() 
m = re.search(r'**insert regex for the tag you want to find here**',b.getvalue()) 
if m == None: 
    #take action for NOT found here 
else: 
    #take action for found here 
2

這是一個示例代碼(未經測試),該日誌併發送郵件:使用beautifulSoup

#!/usr/bin/env python 
import logging 
import urllib2 
import smtplib 

#Log config 
logging.basicConfig(filename='/tmp/yourscript.log',level=logging.INFO,) 

#Open requested url 
url = "http://yoursite.com/tags/yourTag" 
data = urllib2.urlopen(url) 

if check_content(data): 
    #Report to log 
    logging.info('Content found') 
else: 
    #Send mail 
    send_mail('Content not found') 

def check_content(data): 
    #Your BeautifulSoup logic here 
    return content_found 

def send_mail(message_body): 
    server = 'localhost' 
    recipients = ['[email protected]'] 
    sender = '[email protected]' 
    message = 'From: %s \n Subject: script result \n\n %s' % (sender, message_body) 
    session = smtplib.SMTP(server) 
    session.sendmail(sender,recipients,message); 

我會編寫check_content()功能