這個簡單的的Python 3腳本:如何修復ValueError:讀取關閉的文件異常?
import urllib.request
host = "scholar.google.com"
link = "/scholar.bib?q=info:K7uZdMSvdQ0J:scholar.google.com/&output=citation&hl=en&as_sdt=1,14&ct=citation&cd=0"
url = "http://" + host + link
filename = "cite0.bib"
print(url)
urllib.request.urlretrieve(url, filename)
引發此異常:
Traceback (most recent call last):
File "C:\Users\ricardo\Desktop\Google-Scholar\BibTex\test2.py", line 8, in <module>
urllib.request.urlretrieve(url, filename)
File "C:\Python32\lib\urllib\request.py", line 150, in urlretrieve
return _urlopener.retrieve(url, filename, reporthook, data)
File "C:\Python32\lib\urllib\request.py", line 1597, in retrieve
block = fp.read(bs)
ValueError: read of closed file
我想這可能是暫時性的問題,所以我加了一些簡單的異常處理,像這樣:
import random
import time
import urllib.request
host = "scholar.google.com"
link = "/scholar.bib?q=info:K7uZdMSvdQ0J:scholar.google.com/&output=citation&hl=en&as_sdt=1,14&ct=citation&cd=0"
url = "http://" + host + link
filename = "cite0.bib"
print(url)
while True:
try:
print("Downloading...")
time.sleep(random.randint(0, 5))
urllib.request.urlretrieve(url, filename)
break
except ValueError:
pass
但這只是打印Downloading...
無限。
如果你看看'http://scholar.google.com/robots.txt',你可以看到Google禁止自動下載這個頁面。如果你嘗試使用'wget',你會得到'403 Forbidden'錯誤。我懷疑這也發生在您的腳本上。 – 2012-07-17 22:31:22
@senderle沒有API,所以我手動解析它。 – 2012-07-17 22:38:52
@senderle,很可能你需要發送一個cookie來獲取內容。 – 2012-07-17 22:47:03