Python 3從網絡解析PDF

我試圖從網頁獲取PDF，解析並使用PyPDF2將結果打印到屏幕上。我懂了沒有問題的工作與下面的代碼：Python 3從網絡解析PDF

with open("foo.pdf", "wb") as f: 
    f.write(requests.get(buildurl(jornal, date, page)).content) 
pdfFileObj = open('foo.pdf', "rb") 
pdf_reader = PyPDF2.PdfFileReader(pdfFileObj) 
page_obj = pdf_reader.getPage(0) 
print(page_obj.extractText())

中寫入一個文件，這樣我就可以讀它雖然聽起來浪費了，所以我想我只是削減這個中間人：

pdf_reader = PyPDF2.PdfFileReader(requests.get(buildurl(jornal, date, page)).content) 
page_obj = pdf_reader.getPage(0) 
print(page_obj.extractText())

然而，這讓我產生了一個AttributeError: 'bytes' object has no attribute 'seek'。我如何將來自requests的PDF直接送入PyPDF2？

來源

2016-07-30 Bernardo Meurer

你必須返回content轉換爲使用一個類似文件的對象：

import io 

pdf_content = io.BytesIO(requests.get(buildurl(jornal, date, page)).content) 
pdf_reader = PyPDF2.PdfFileReader(pdf_content)

來源

2016-07-30 21:03:11

使用IO僞造使用文件（Python 3中）：

import io 

output = io.BytesIO() 
output.write(requests.get(buildurl(jornal, date, page)).content) 
output.seek(0) 
pdf_reader = PyPDF2.PdfFileReader(output)

我沒有在你的環境測試，但是我測試了這個簡單的例子，它的工作：

import io 

output = io.BytesIO() 
output.write(bytes("hello world","ascii")) 
output.seek(0) 
print(output.read())

產量：

b'hello world'

來源

2016-07-30 21:00:22

對不起，我忘了提及我需要Python3兼容 –

Python 3從網絡解析PDF

回答

相關問題