2
我已成功安裝pyPDF,但extractText方法並不能很好的工作,所以我決定嘗試pyPDF2,問題是,提取文本時,有一個例外:pyPDF2類型錯誤時,提取文本
Traceback (most recent call last):
File "C:\Users\Asus\Desktop\pfdtest.py", line 44, in <module>
test2()
File "C:\Users\Asus\Desktop\pfdtest.py", line 41, in test2
print(mypdf.getPage(0).extractText())
File "C:\Python32\lib\site-packages\PyPDF2\pdf.py", line 1701, in extractText
content = ContentStream(content, self.pdf)
File "C:\Python32\lib\site-packages\PyPDF2\pdf.py", line 1783, in __init__
stream = StringIO(stream.getData())
TypeError: initial_value must be str or None, not bytes
,這是我的示例代碼:
filename = "myfile.pdf"
f = open(filename,'rb')
mypdf = PdfFileReader(f)
print(f,mypdf,mypdf.getNumPages())
print(mypdf.getPage(0).extractText())
它正確地確定在PDF頁面的數量,但它與讀取流的問題。
你終於找到解決方案嗎? – juankysmith
不幸的是,但它已經有一段時間了,也許他們已經修復了它。 –