2016-07-13 33 views
0

我正在嘗試使用Ghostscript重新保存PDF(以糾正PyPDF2無法處理的錯誤)。我打電話給Ghostscript subprocess.check_output,我想將原始PDF作爲STDIN傳遞,並將新的PDF作爲STDOUT導出。如何通過STDOUT從Python子流程命令導出二進制數據?

當我將PDF保存到文件並重新讀入時,它工作正常。當我嘗試從STDOUT傳入文件時,它不起作用。我想也許這可能是一個編碼問題,但我不想將任何內容編碼爲文本,我只想要二進制數據。也許有一些關於編碼我不明白。

如何使STDOUT數據像文件數據一樣工作?

import subprocess 
from PyPDF2 import PdfFileReader 
from io import BytesIO 
import traceback 

input_file_name = "SKMBT_42116071215160 (1).pdf" 
output_file_name = 'saved2.pdf' 
# input_file = open(input_file_name, "rb") # Moved below. 

# Write to a file, then read the file back in. This works. 
try: 
    ps1 = subprocess.check_output(
     ('gs', '-o', output_file_name, '-sDEVICE=pdfwrite', '-dPDFSETTINGS=/prepress', input_file_name), 
     # stdin=input_file # [edit] We pass in the file name, so this only confuses things. 
    ) 
    # I use BytesIO() in this example only to make the examples parallel. 
    # In the other example, I use BytesIO() because I can't pass a string to PdfFileReader(). 
    fakeFile1 = BytesIO() 
    fakeFile1.write(open(output_file_name, "rb").read()) 
    inputpdf = PdfFileReader(fakeFile1) 
    print inputpdf 
except: 
    traceback.print_exc() 

print "---------" 
# input_file.seek(0) # Added to address one comment. Removed while addressing another. 
input_file = open(input_file_name, "rb") 

# Export to STDOUT. This doesn't work. 
try: 
    ps2 = subprocess.check_output(
     ('gs', '-o', '-', '-sDEVICE=pdfwrite', '-dPDFSETTINGS=/prepress', '-'), 
     stdin=input_file, 
     # shell=True # Using shell produces the same error. 
    ) 
    fakeFile2 = BytesIO() 
    fakeFile2.write(ps2) 
    inputpdf = PdfFileReader(fakeFile2) 
    print inputpdf 
except: 
    traceback.print_exc() 

輸出:

**** The file was produced by: 
    **** >>>> KONICA MINOLTA bizhub 421 <<<< 
<PyPDF2.pdf.PdfFileReader object at 0x101d1d550> 
--------- 
    **** The file was produced by: 
    **** >>>> KONICA MINOLTA bizhub 421 <<<< 
Traceback (most recent call last): 
    File "pdf_file_reader_test2.py", line 34, in <module> 
    inputpdf = PdfFileReader(fakeFile2) 
    File "/Library/Python/2.7/site-packages/PyPDF2/pdf.py", line 1065, in __init__ 
    self.read(stream) 
    File "/Library/Python/2.7/site-packages/PyPDF2/pdf.py", line 1774, in read 
    idnum, generation = self.readObjectHeader(stream) 
    File "/Library/Python/2.7/site-packages/PyPDF2/pdf.py", line 1638, in readObjectHeader 
    return int(idnum), int(generation) 
ValueError: invalid literal for int() with base 10: "7-8138-11f1-0000-59be60c931e0'" 
+0

在windows上,需要將stdout配置爲像這樣的二進制文件:http://stackoverflow.com/questions/2374427/python-2-x-write-binary-output-to-stdout。不知道它有幫助。值得一試。 –

+0

值得一提,但我不認爲這是這種情況下的解決方案。我使用的是OS X,我不知道可以更改的類似設置。 – Travis

+0

不確定,但這是正常的,你不倒帶2調用之間的'input_file'? (工作和沒有) –

回答

0

事實證明,這無關與Python。這是一個Ghostscript錯誤。正如本文中指出的:Prevent Ghostscript from writing errors to standard output,Ghostscript將錯誤寫入標準輸出,這會破壞管道輸出的文件。

感謝@ Jean-FrançoisFabre,他建議我查看二進制文件。

+0

請將此答案標記爲已接受,以便此問題不再出現爲未解決問題。也許還可以重溫這個問題?謝謝。 – tripleee

+0

當我這樣做時,它說:「你明天可以接受你自己的答案」 – Travis

相關問題