我正在編寫一個用於上傳PDF文件並在過程中解析它們的腳本。爲解析我使用PDFminer。使用PDFminer作爲庫:「AttributeError:'NoneType'對象沒有屬性'getobj'」
對於打開文件轉換成PDFMiner文件,我使用下面的函數,整齊地跟隨你可以在上面的鏈接找到的說明:
def load_document(self, _file = None):
"""turn the file into a PDFMiner document"""
if _file == None:
_file = self.options['file']
parser = PDFParser(_file)
doc = PDFDocument()
doc.set_parser(parser)
if self.options['password']:
password = self.options['password']
else:
password = ""
doc.initialize(password)
if not doc.is_extractable:
raise ValueError("PDF text extraction not allowed")
return doc
預期的結果當然是一個不錯PDFDocument
實例,但而是我得到一個錯誤:
Traceback (most recent call last):
File "bzk_pdf.py", line 45, in <module>
cli.run_cli(BZKPDFScraper)
File "/home/toon/Projects/amcat/amcat/scripts/tools/cli.py", line 61, in run_cli
instance = cls(options)
File "/home/toon/Projects/amcat/amcat/scraping/pdf.py", line 44, in __init__
self.doc = self.load_document()
File "/home/toon/Projects/amcat/amcat/scraping/pdf.py", line 56, in load_document
doc.set_parser(parser)
File "/usr/local/lib/python2.7/dist-packages/pdfminer/pdfparser.py", line 327, in set_parser
self.info.append(dict_value(trailer['Info']))
File "/usr/local/lib/python2.7/dist-packages/pdfminer/pdftypes.py", line 132, in dict_value
x = resolve1(x)
File "/usr/local/lib/python2.7/dist-packages/pdfminer/pdftypes.py", line 60, in resolve1
x = x.resolve()
File "/usr/local/lib/python2.7/dist-packages/pdfminer/pdftypes.py", line 49, in resolve
return self.doc.getobj(self.objid)
AttributeError: 'NoneType' object has no attribute 'getobj'
我不知道在哪裏看,我還沒有找到其他人有同樣的問題。
一些額外的信息,這可能有助於:
- 這裏是我的測試文件:http://www.2shared.com/document/kM_wrI3J/testpdf.html
_file
是一個django File object,但使用普通文件有相同的結果- pdfminer版本: 'pdfminer-20110515'
- Django:1.4.3(我認爲不重要)
- Python 2.7.3
小了點,但我認爲你的意思是1.4.3版本的Django。 – 2013-02-17 09:26:40
有沒有人得到答案?或試圖重現這個問題?我真的需要一個答案... – ToonAlfrink 2013-02-17 12:12:23