2015-02-23 59 views
0

我想在PDF文件中旋轉頁面,然後用同一個pdf文件中的旋轉頁面替換舊頁面。如何編輯PDF文件,替換其數據?

我寫了下面的代碼:

#!/usr/bin/python 

import os 
from pyPdf import PdfFileReader, PdfFileWriter 

my_path = "/home/USER/Desktop/files/" 

input_file_name = os.path.join(my_path, "myfile.pdf") 
input_file = PdfFileReader(file(input_file_name, "rb")) 
input_file.decrypt("MyPassword") 
output_PDF = PdfFileWriter() 

for num_page in range(0, input_file.getNumPages()): 
    page = input_file.getPage(num_page) 
    page.rotateClockwise(270) 
    output_PDF.addPage(page) 

#Trying to replace old data with new data in the original file, not 
#create a new file and add the new data! 
output_file_name = os.path.join(my_path, "myfile.pdf") 
output_file = file(output_file_name, "wb") 
output_PDF.write(output_file) 
output_file.close() 

上面的代碼給我一個錯誤!我已經甚至嘗試使用:

input_file = PdfFileReader(file(input_file_name, "r+b")) 

,但它沒有工作,要麼...

更改行:

output_file_name = os.path.join(my_path, "myfile.pdf") 

有:

output_file_name = os.path.join(my_path, "myfile2.pdf") 

修復一切,但它不是我想要的...

有什麼幫助嗎?

錯誤代碼:

Traceback (most recent call last): File "12-5.py", line 22, in output_PDF.write(output_file) File "/usr/lib/pymodules/python2.7/pyPdf/pdf.py", line 264, in write self._sweepIndirectReferences(externalReferenceMap, self._root) File "/usr/lib/pymodules/python2.7/pyPdf/pdf.py", line 339, in _sweepIndirectReferences self._sweepIndirectReferences(externMap, realdata) File "/usr/lib/pymodules/python2.7/pyPdf/pdf.py", line 315, in _sweepIndirectReferences value = self._sweepIndirectReferences(externMap, value) File "/usr/lib/pymodules/python2.7/pyPdf/pdf.py", line 339, in _sweepIndirectReferences self._sweepIndirectReferences(externMap, realdata) File "/usr/lib/pymodules/python2.7/pyPdf/pdf.py", line 315, in _sweepIndirectReferences value = self._sweepIndirectReferences(externMap, value) File "/usr/lib/pymodules/python2.7/pyPdf/pdf.py", line 324, in _sweepIndirectReferences value = self._sweepIndirectReferences(externMap, data[i]) File "/usr/lib/pymodules/python2.7/pyPdf/pdf.py", line 339, in _sweepIndirectReferences self._sweepIndirectReferences(externMap, realdata) File "/usr/lib/pymodules/python2.7/pyPdf/pdf.py", line 315, in _sweepIndirectReferences value = self._sweepIndirectReferences(externMap, value) File "/usr/lib/pymodules/python2.7/pyPdf/pdf.py", line 324, in _sweepIndirectReferences value = self._sweepIndirectReferences(externMap, data[i]) File "/usr/lib/pymodules/python2.7/pyPdf/pdf.py", line 345, in _sweepIndirectReferences newobj = data.pdf.getObject(data) File "/usr/lib/pymodules/python2.7/pyPdf/pdf.py", line 649, in getObject retval = readObject(self.stream, self) File "/usr/lib/pymodules/python2.7/pyPdf/generic.py", line 67, in readObject return DictionaryObject.readFromStream(stream, pdf) File "/usr/lib/pymodules/python2.7/pyPdf/generic.py", line 564, in readFromStream raise utils.PdfReadError, "Unable to find 'endstream' marker after stream." pyPdf.utils.PdfReadError: Unable to find 'endstream' marker after stream.

+0

你是什麼意思的「它沒有工作」和「給出錯誤」 – 2015-02-23 17:33:30

+0

編輯錯誤代碼! – midkin 2015-02-23 17:36:43

回答

1

的問題,我懷疑,是PyPDF從文件中讀取,因爲它是被寫入。

正如您已經注意到的那樣,正確的修復方法是寫入單獨的文件,然後用新文件替換原始文件。事情是這樣的:

output_file_name = os.path.join(my_path, "myfile-temporary.pdf") 
output_file = file(output_file_name, "wb") 
output_PDF.write(output_file) 
output_file.close() 
os.rename(output_file_name, input_file_name) 

我已經寫了一些代碼從而簡化了這一點:https://github.com/shazow/unstdlib.py/blob/master/unstdlib/standard/contextlib_.py#L14

from unstdlib.standard.contextlib_ import open_atomic 

with open_atomic(input_file_name, "wb") as output_file: 
    output_PDF.write(output_file) 

這將自動創建一個臨時文件,寫入它,然後替換原來的文件。

編輯:我最初錯誤地讀了這個問題。以下是我的不正確,但對其他人的答案可能有幫助。

您的代碼很好,並且應該在「大多數」PDF上無問題地工作。

您看到的問題是PyPDF與您嘗試使用的特定PDF之間不兼容。這可能是PyPDF中的一個錯誤,也可能是PDF不完全有效。

有兩件事情可以嘗試:

  1. 看看PyPDF2可以讀取該文件。用pip install PyPDF2安裝PyPDF2,用import PyPDF2 …替換import pyPdf …,然後重新運行腳本。

  2. 使用其他程序重新編碼您的PDF,看看是否有用。例如,使用像convert bad.pdf bad.ps; convert bad.ps maybe-good.pdf之類的東西可能會修復一些東西

+0

1.試過了!許多錯誤代碼行。開始於:Traceback(最近一次調用最後一次): 文件「12-5.py」,第22行,在 output_PDF.write(output_file) 2.不知道該怎麼做! – midkin 2015-02-23 18:32:48

+0

我的歉意 - 我誤解了這個問題。看到我更新的答案。 – 2015-02-23 18:33:26

+0

好吧,os.rename的工作原理! 但是,我認爲正確的答案是我所要做的不能這樣做,因爲PyPDF在寫入文件時正在讀取文件! :) 但如果有人需要這樣做,而不是創建並保存在他的硬盤驅動器中的新PDF文件,然後確保os.rename是做到這一點! 因爲這樣就可以做我所需要的東西,即使不是我想的方式,我會選擇這個作爲正確的答案! :) – midkin 2015-02-23 18:43:51