如何修復ValueError：讀取關閉的文件異常？

這個簡單的的Python 3腳本：如何修復ValueError：讀取關閉的文件異常？

import urllib.request 

host = "scholar.google.com" 
link = "/scholar.bib?q=info:K7uZdMSvdQ0J:scholar.google.com/&output=citation&hl=en&as_sdt=1,14&ct=citation&cd=0" 
url = "http://" + host + link 
filename = "cite0.bib" 
print(url) 
urllib.request.urlretrieve(url, filename)

引發此異常：

Traceback (most recent call last): 
    File "C:\Users\ricardo\Desktop\Google-Scholar\BibTex\test2.py", line 8, in <module> 
    urllib.request.urlretrieve(url, filename) 
    File "C:\Python32\lib\urllib\request.py", line 150, in urlretrieve 
    return _urlopener.retrieve(url, filename, reporthook, data) 
    File "C:\Python32\lib\urllib\request.py", line 1597, in retrieve 
    block = fp.read(bs) 
ValueError: read of closed file

我想這可能是暫時性的問題，所以我加了一些簡單的異常處理，像這樣：

import random 
import time 
import urllib.request 

host = "scholar.google.com" 
link = "/scholar.bib?q=info:K7uZdMSvdQ0J:scholar.google.com/&output=citation&hl=en&as_sdt=1,14&ct=citation&cd=0" 
url = "http://" + host + link 
filename = "cite0.bib" 
print(url) 
while True: 
    try: 
     print("Downloading...") 
     time.sleep(random.randint(0, 5)) 
     urllib.request.urlretrieve(url, filename) 
     break 
    except ValueError: 
     pass

但這只是打印Downloading...無限。

來源

2012-07-17 Ricardo Altamirano

如果你看看'http：//scholar.google.com/robots.txt'，你可以看到Google禁止自動下載這個頁面。如果你嘗試使用'wget'，你會得到'403 Forbidden'錯誤。我懷疑這也發生在您的腳本上。 – 2012-07-17 22:31:22

@senderle沒有API，所以我手動解析它。 – 2012-07-17 22:38:52

@senderle，很可能你需要發送一個cookie來獲取內容。 – 2012-07-17 22:47:03

你的URL返回一個403碼錯誤，顯然urllib.request.urlretrieve不擅長檢測所有HTTP錯誤，因爲它使用urllib.request.FancyURLopener這個最新的嘗試返回一個urlinfo而不是引發錯誤吞下錯誤。

關於如果你仍然想使用urlretrieve像（包括代碼也顯示錯誤），這可以覆蓋FancyURLopener修復：

import urllib.request 
from urllib.request import FancyURLopener 


class FixFancyURLOpener(FancyURLopener): 

    def http_error_default(self, url, fp, errcode, errmsg, headers): 
     if errcode == 403: 
      raise ValueError("403") 
     return super(FixFancyURLOpener, self).http_error_default(
      url, fp, errcode, errmsg, headers 
     ) 

# Monkey Patch 
urllib.request.FancyURLopener = FixFancyURLOpener 

url = "http://scholar.google.com/scholar.bib?q=info:K7uZdMSvdQ0J:scholar.google.com/&output=citation&hl=en&as_sdt=1,14&ct=citation&cd=0" 
urllib.request.urlretrieve(url, "cite0.bib")

否則這就是我建議可以使用urllib.request.urlopen像這樣：

fp = urllib.request.urlopen('http://scholar.google.com/scholar.bib?q=info:K7uZdMSvdQ0J:scholar.google.com/&output=citation&hl=en&as_sdt=1,14&ct=citation&cd=0') 
with open("citi0.bib", "w") as fo: 
    fo.write(fp.read())

來源

2012-07-17 22:56:46 mouad

感謝您的幫助。 +1和接受猴子補丁和一般幫助，儘管我已經意識到，根據上述評論，「robots.txt」不允許下載這些文件。我完全忘了檢查。 – 2012-07-18 10:57:25

如何修復ValueError：讀取關閉的文件異常？

回答

相關問題