2016-09-15 52 views
0

我的代碼:urllib.error.HTTPError:HTTP錯誤403:禁止的Python

import sqlite3, os, urllib.request 
from xml.dom import minidom 

if os.path.exists("data.db"): 
    con = sqlite3.connect("data.db") 
    cursor = con.cursor() 
    sql = "SELECT * FROM data WHERE test= '123'" 
    cursor.execute(sql) 
else: 
    print("ERROR") 

for dsatz in cursor: 
    #print(dsatz) 
    link = 'http://test.org/publication/' + dsatz[0] + '' + dsatz[1] +'/bib' 
    #print(link) 

    web_data = urllib.request.urlopen(link) 
    xmldoc = minidom.parse(web_data) 

    di = xmldoc.getElementsByTagName("document-id")[:1] 

    for x in di: 
     publicationcountry = x.getElementsByTagName("country")[0].firstChild.data 
     publicationdocnumber = x.getElementsByTagName("doc-number")[0].firstChild.data 
     punlicationkind = x.getElementsByTagName("kind")[0].firstChild.data 
     publicationdate = x.getElementsByTagName("date")[0].firstChild.data  

     sql = "INSERT INTO link_xml_data VALUES('" \ 
     + publicationcountry + "', '" \ 
     + str(publicationdocnumber) + "', '" \ 
     + punlicationkind + "')" 

     con.close() 

但經過像15頁的鏈接我得到的錯誤:

Traceback (most recent call last): 
    File "C:\Users\j\3.py", line 34, in <module> 
    web_data = urllib.request.urlopen(link) 
    File "C:\Users\j\Python35-32\lib\urllib\request.py", line 163, in urlopen 
    return opener.open(url, data, timeout) 
    File "C:\Users\j\Python35-32\lib\urllib\request.py", line 472, in open 
    response = meth(req, response) 
    File "C:\Users\j\Python35-32\lib\urllib\request.py", line 582, in http_response 
    'http', request, response, code, msg, hdrs) 
    File "C:\Users\j\Python35-32\lib\urllib\request.py", line 510, in error 
    return self._call_chain(*args) 
    File "C:\Users\j\Python35-32\lib\urllib\request.py", line 444, in _call_chain 
    result = func(*args) 
    File "C:\Users\j\Python35-32\lib\urllib\request.py", line 590, in http_error_default 
    raise HTTPError(req.full_url, code, msg, hdrs, fp) 
urllib.error.HTTPError: HTTP Error 403: Forbidden 

什麼是我應該添加或更改?

回答

0

Web服務器告訴你該鏈接被禁止。有(可能)沒有錯你的代碼。

做一些鏈接總是有效,其他鏈接總是失敗,或者模式隨時間而改變嗎?

獲得403禁止回覆後,您是否嘗試回去並重新申請其中一個早期成功的鏈接?

也許服務器最終將您識別爲網頁刮板並告訴您要離開?