2012-12-16 89 views
-4

我一直在使用Python和Urllib2編寫Robots.txt網站列表的下載器。以下是代碼Urllib2 Python錯誤

import MySQLdb 
    import urllib 
    import urllib2 
    clone=0 
    db = MySQLdb.connect("127.0.0.1","root","","research") 
    cursor = db.cursor() 
    sql = "SELECT * FROM sites" 
    try: 
    cursor.execute(sql) 
     # Fetch all the rows in a list of lists. 
    results = cursor.fetchall() 
    for row in results: 
    id = row[0] 
    website = row[1] 
    website=website+"robots.txt" 
    print website 
    try: 
     check = urllib2.urlopen(website,timeout=10).code 
     if not check: 
      print "No WEBSERVER FOUND" 
      clone=1 
    except IOError: 
     clone=1 
     print "No Webserver Found" 
    if(check==200 or clone==0): 
     sql2 = "UPDATE sites SET robots_txt_available=1 WHERE ID=%s" % \ 
      (id) 
        cursor.execute(sql) 
     print website," Has Robots.txt."; 
    else:print website," does not Have robots.txt." 
    except: 
      print "Error: unable to fecth data" 

      # disconnect from server 
    db.close() 

代碼的輸出是:

http://rashtrapatisachivalaya.gov.in/robots.txt 
No Webserver Found 
Error: unable to fecth data 

所以沒有完全執行。任何人都可以告訴本代碼中存在什麼問題。

+1

不知怎的,我所期待的混合詞的變量名... –

回答

1

您的觀點是什麼?給定的URL只是不存在,因此except子句中的代碼正在執行。和「代碼」屬性時沒有異常,才能執行訪問...

的妥善解決是

import urllib2 
try: 
    urllib2.urlopen("some url") 
except urllib2.HTTPError, err: 
    if err.code == 404: 
     <whatever> 
    else: 
     raise 
+0

是那正是我試圖好。如果一個網站不存在,則代碼的其餘部分不會執行(即其餘網站)。 – NotToBeKnown

+0

請問什麼?詢問一個連貫的問題... –