我正在嘗試使用Ghostscript重新保存PDF(以糾正PyPDF2無法處理的錯誤)。我打電話給Ghostscript subprocess.check_output
,我想將原始PDF作爲STDIN傳遞,並將新的PDF作爲STDOUT導出。如何通過STDOUT從Python子流程命令導出二進制數據?
當我將PDF保存到文件並重新讀入時,它工作正常。當我嘗試從STDOUT傳入文件時,它不起作用。我想也許這可能是一個編碼問題,但我不想將任何內容編碼爲文本,我只想要二進制數據。也許有一些關於編碼我不明白。
如何使STDOUT數據像文件數據一樣工作?
import subprocess
from PyPDF2 import PdfFileReader
from io import BytesIO
import traceback
input_file_name = "SKMBT_42116071215160 (1).pdf"
output_file_name = 'saved2.pdf'
# input_file = open(input_file_name, "rb") # Moved below.
# Write to a file, then read the file back in. This works.
try:
ps1 = subprocess.check_output(
('gs', '-o', output_file_name, '-sDEVICE=pdfwrite', '-dPDFSETTINGS=/prepress', input_file_name),
# stdin=input_file # [edit] We pass in the file name, so this only confuses things.
)
# I use BytesIO() in this example only to make the examples parallel.
# In the other example, I use BytesIO() because I can't pass a string to PdfFileReader().
fakeFile1 = BytesIO()
fakeFile1.write(open(output_file_name, "rb").read())
inputpdf = PdfFileReader(fakeFile1)
print inputpdf
except:
traceback.print_exc()
print "---------"
# input_file.seek(0) # Added to address one comment. Removed while addressing another.
input_file = open(input_file_name, "rb")
# Export to STDOUT. This doesn't work.
try:
ps2 = subprocess.check_output(
('gs', '-o', '-', '-sDEVICE=pdfwrite', '-dPDFSETTINGS=/prepress', '-'),
stdin=input_file,
# shell=True # Using shell produces the same error.
)
fakeFile2 = BytesIO()
fakeFile2.write(ps2)
inputpdf = PdfFileReader(fakeFile2)
print inputpdf
except:
traceback.print_exc()
輸出:
**** The file was produced by:
**** >>>> KONICA MINOLTA bizhub 421 <<<<
<PyPDF2.pdf.PdfFileReader object at 0x101d1d550>
---------
**** The file was produced by:
**** >>>> KONICA MINOLTA bizhub 421 <<<<
Traceback (most recent call last):
File "pdf_file_reader_test2.py", line 34, in <module>
inputpdf = PdfFileReader(fakeFile2)
File "/Library/Python/2.7/site-packages/PyPDF2/pdf.py", line 1065, in __init__
self.read(stream)
File "/Library/Python/2.7/site-packages/PyPDF2/pdf.py", line 1774, in read
idnum, generation = self.readObjectHeader(stream)
File "/Library/Python/2.7/site-packages/PyPDF2/pdf.py", line 1638, in readObjectHeader
return int(idnum), int(generation)
ValueError: invalid literal for int() with base 10: "7-8138-11f1-0000-59be60c931e0'"
在windows上,需要將stdout配置爲像這樣的二進制文件:http://stackoverflow.com/questions/2374427/python-2-x-write-binary-output-to-stdout。不知道它有幫助。值得一試。 –
值得一提,但我不認爲這是這種情況下的解決方案。我使用的是OS X,我不知道可以更改的類似設置。 – Travis
不確定,但這是正常的,你不倒帶2調用之間的'input_file'? (工作和沒有) –