2017-03-03 171 views
0

我有一個Python腳本,它使用PyPDF2來顛倒PDF頁面的順序。PyPDF2:流意外結束

from PyPDF2 import PdfFileWriter, PdfFileReader 

output = PdfFileWriter() 
rpage = [] 
name = input("What's the file called?") 

filename = name.split('.', 1) 

input1 = PdfFileReader(open(name,'rb'), strict = False) 

pages = list(range(1,input1.getNumPages() + 1)) 

for i in range(0, (input1.getNumPages())): 
    rpage.append(pages[input1.getNumPages() - i -1]) 
for i in rpage: 
    output.addPage(input1.getPage(i-1)) 

outputpath = filename[0] + '-reversed.pdf' 

outputStream = open(outputpath, "wb") 
output.write(outputStream) 

如預期,直到嘗試寫入輸出流,它返回此錯誤哪些功能:

PdfReadWarning: Invalid stream (index 59) within object 108 0: Stream has ended unexpectedly [pdf.py:1573] 
Traceback (most recent call last): 
    File "D:\Documents\Google Drive\Programming\Python\PDF Scripts\reverse pdf.py", line 22, in <module> 
output.write(outputStream) 
    File "C:\Users\Charles\Anaconda3\lib\site-packages\PyPDF2\pdf.py", line 482, in write 
self._sweepIndirectReferences(externalReferenceMap, self._root) 
    File "C:\Users\Charles\Anaconda3\lib\site-packages\PyPDF2\pdf.py", line 571, in _sweepIndirectReferences 
self._sweepIndirectReferences(externMap, realdata) 
    File "C:\Users\Charles\Anaconda3\lib\site-packages\PyPDF2\pdf.py", line 547, in _sweepIndirectReferences 
value = self._sweepIndirectReferences(externMap, value) 
    File "C:\Users\Charles\Anaconda3\lib\site-packages\PyPDF2\pdf.py", line 571, in _sweepIndirectReferences 
self._sweepIndirectReferences(externMap, realdata) 
    File "C:\Users\Charles\Anaconda3\lib\site-packages\PyPDF2\pdf.py", line 547, in _sweepIndirectReferences 
    value = self._sweepIndirectReferences(externMap, value) 
    File "C:\Users\Charles\Anaconda3\lib\site-packages\PyPDF2\pdf.py", line 556, in _sweepIndirectReferences 
    value = self._sweepIndirectReferences(externMap, data[i]) 
    File "C:\Users\Charles\Anaconda3\lib\site-packages\PyPDF2\pdf.py", line 571, in _sweepIndirectReferences 
    self._sweepIndirectReferences(externMap, realdata) 
    File "C:\Users\Charles\Anaconda3\lib\site-packages\PyPDF2\pdf.py", line 547, in _sweepIndirectReferences 
    value = self._sweepIndirectReferences(externMap, value) 
    File "C:\Users\Charles\Anaconda3\lib\site-packages\PyPDF2\pdf.py", line 577, in _sweepIndirectReferences 
    newobj = data.pdf.getObject(data) 
    File "C:\Users\Charles\Anaconda3\lib\site-packages\PyPDF2\pdf.py", line 1611, in getObject 
    retval = readObject(self.stream, self) 
    File "C:\Users\Charles\Anaconda3\lib\site-packages\PyPDF2\generic.py", line 66, in readObject 
    return DictionaryObject.readFromStream(stream, pdf) 
    File "C:\Users\Charles\Anaconda3\lib\site-packages\PyPDF2\generic.py", line 611, in readFromStream 
    data["__streamdata__"] = stream.read(length) 
TypeError: integer argument expected, got 'NullObject' 

代碼中並創建一個PDF文件,但它的大小0KB的是,因此,不可讀。我測試了示例腳本合併三個PDF文件發現here產生另一個空文件,並導致此錯誤:

PdfReadWarning: Invalid stream (index 59) within object 108 0: Stream has ended unexpectedly [pdf.py:1573] 
Traceback (most recent call last): 
    File "C:\Users\Charles\Anaconda3\lib\site-packages\PyPDF2\pdf.py", line 1567, in _getObjectFromStream 
    obj = readObject(streamData, self) 
    File "C:\Users\Charles\Anaconda3\lib\site-packages\PyPDF2\generic.py", line 98, in readObject 
    return NumberObject.readFromStream(stream) 
    File "C:\Users\Charles\Anaconda3\lib\site-packages\PyPDF2\generic.py", line 269, in readFromStream 
    num = utils.readUntilRegex(stream, NumberObject.NumberPattern) 
    File "C:\Users\Charles\Anaconda3\lib\site-packages\PyPDF2\utils.py", line 134, in readUntilRegex 
    raise PdfStreamError("Stream has ended unexpectedly") 
PyPDF2.utils.PdfStreamError: Stream has ended unexpectedly 

During handling of the above exception, another exception occurred: 

Traceback (most recent call last): 
    File "D:\Documents\Google Drive\Programming\Python\PDF Scripts\untitled1.py", line 27, in <module> 
    merger.write(output) 
    File "C:\Users\Charles\Anaconda3\lib\site-packages\PyPDF2\merger.py", line 230, in write 
    self.output.write(fileobj) 
    File "C:\Users\Charles\Anaconda3\lib\site-packages\PyPDF2\pdf.py", line 482, in write 
    self._sweepIndirectReferences(externalReferenceMap, self._root) 
    File "C:\Users\Charles\Anaconda3\lib\site-packages\PyPDF2\pdf.py", line 571, in _sweepIndirectReferences 
    self._sweepIndirectReferences(externMap, realdata) 
    File "C:\Users\Charles\Anaconda3\lib\site-packages\PyPDF2\pdf.py", line 547, in _sweepIndirectReferences 
    value = self._sweepIndirectReferences(externMap, value) 
    File "C:\Users\Charles\Anaconda3\lib\site-packages\PyPDF2\pdf.py", line 571, in _sweepIndirectReferences 
    self._sweepIndirectReferences(externMap, realdata) 
    File "C:\Users\Charles\Anaconda3\lib\site-packages\PyPDF2\pdf.py", line 547, in _sweepIndirectReferences 
    value = self._sweepIndirectReferences(externMap, value) 
    File "C:\Users\Charles\Anaconda3\lib\site-packages\PyPDF2\pdf.py", line 556, in _sweepIndirectReferences 
    value = self._sweepIndirectReferences(externMap, data[i]) 
    File "C:\Users\Charles\Anaconda3\lib\site-packages\PyPDF2\pdf.py", line 571, in _sweepIndirectReferences 
    self._sweepIndirectReferences(externMap, realdata) 
    File "C:\Users\Charles\Anaconda3\lib\site-packages\PyPDF2\pdf.py", line 547, in _sweepIndirectReferences 
    value = self._sweepIndirectReferences(externMap, value) 
    File "C:\Users\Charles\Anaconda3\lib\site-packages\PyPDF2\pdf.py", line 577, in _sweepIndirectReferences 
    newobj = data.pdf.getObject(data) 
    File "C:\Users\Charles\Anaconda3\lib\site-packages\PyPDF2\pdf.py", line 1611, in getObject 
    retval = readObject(self.stream, self) 
    File "C:\Users\Charles\Anaconda3\lib\site-packages\PyPDF2\generic.py", line 66, in readObject 
    return DictionaryObject.readFromStream(stream, pdf) 
    File "C:\Users\Charles\Anaconda3\lib\site-packages\PyPDF2\generic.py", line 609, in readFromStream 
    length = pdf.getObject(length) 
    File "C:\Users\Charles\Anaconda3\lib\site-packages\PyPDF2\pdf.py", line 1593, in getObject 
    retval = self._getObjectFromStream(indirectReference) 
    File "C:\Users\Charles\Anaconda3\lib\site-packages\PyPDF2\pdf.py", line 1576, in _getObjectFromStream 
    raise utils.PdfReadError("Can't read object stream: %s"%e) 
PyPDF2.utils.PdfReadError: Can't read object stream: Stream has ended unexpectedly 

當該腳本用於將PDF拆分成其組成的網頁也被輸出之前的錯誤:

from PyPDF2 import PdfFileWriter, PdfFileReader 
infile = PdfFileReader(open('test.pdf', 'rb')) 

for i in range(infile.getNumPages()): 
    p = infile.getPage(i) 
    outfile = PdfFileWriter() 
    outfile.addPage(p) 
    with open('page-%02d.pdf' % i, 'wb') as f: 
     outfile.write(f) 

上述代碼生成(n-1)個可讀PDF,但第n個PDF是一個空文件。任何想法如何我可以解決這個問題?

回答

0

你的腳本通過幾個不同地方的頁面計數,其目的不明確。我相信你如何向後倒數是你錯誤的根源。

我把你的腳本,並首先調整到2.7(因爲這就是我正在運行),然後簡化它向後走過你的源文件一次,創建你的反向文件。

from PyPDF2 import PdfFileWriter, PdfFileReader 

output = PdfFileWriter() 
# rpage = [] removed because it's not needed anymore 
name = raw_input("What's the file called? ") #Changed for the 2.7 environment 

filename = name[:-4] #Simplified, since we know where the piece we want is. 

input1 = PdfFileReader(name,"rb") 
#Simplified, because I couldn't figure out why it was complex. 

for i in range(input1.getNumPages(),0,-1): 
    #getNumPages counts like a human and gives the total number of pages 
    #This counts backwards, so no need to count forward and use that to 
    #reverse the numbers. 
    output.addPage(input1.getPage(i-1)) 
    #getPage counts like a computer and needs to finish with page 0. 

outputpath = filename + '-reversed.pdf' 

outputStream = open(outputpath, "wb") 
output.write(outputStream) 
outputStream.close() #Closes the file and stream once you're done. 
+0

我運行該程序,通過將「raw_input」替換爲「input」並將其更改回python 3並得到此錯誤http://pastebin.com/PgwQvCyQ –

+0

在您的reader對象上,試試這個:input1 = PdfFileReader(名稱,'rb',strict = False)' 根據該網站,讀者可能存在一個錯誤。 –

0

如果你想要的是能夠扭轉打印的網頁,並且你不關心努力維護內部鏈接和註釋,pdfrw可能是比pyPDF2任務更好:

from pdfrw import PdfWriter, PdfReader 

iname = input("What's the file called? ") 
oname = iname.rsplit('.', 1)[0] + '-reversed.pdf' 

output = PdfWriter() 
output.addpages(reversed(PdfReader(iname).pages)) 
output.write(oname) 

聲明:我是pdfrw的主要作者。