2012-02-21 67 views
10

我已經使用pywin32在python中編寫了一個腳本來將pdf文件保存爲直到最近才正常工作的文本。我在Excel中使用類似的方法。代碼如下:「未實現」使用pywin32來控制Adobe Acrobat時出現異常

def __pdf2Txt(self, pdf, fileformat="com.adobe.acrobat.accesstext"): 
    outputLoc = os.path.dirname(pdf) 
    outputLoc = os.path.join(outputLoc, os.path.splitext(os.path.basename(pdf))[0] + '.txt') 

    try: 
     win32com.client.gencache.EnsureModule('{E64169B3-3592-47d2-816E-602C5C13F328}', 0, 1, 1) 
     adobe = win32com.client.DispatchEx('AcroExch.App') 
     pdDoc = win32com.client.DispatchEx('AcroExch.PDDoc') 
     pdDoc.Open(pdf) 
     jObject = pdDoc.GetJSObject() 
     jObject.SaveAs(outputLoc, "com.adobe.acrobat.accesstext") 
    except: 
     traceback.print_exc() 
     return False 
    finally: 
     del jObject 
     pdDoc.Close() 
     del pdDoc 
     adobe.Exit() 
     del adobe 

但是這個代碼突然停止工作,我得到下面的輸出:

Traceback (most recent call last): 
    File "C:\Documents and Settings\ablishen\workspace\HooverKeyCreator\src\HooverKeyCreator.py", line 38, in __pdf2Txt 
    jObject.SaveAs(outputLoc, "com.adobe.acrobat.accesstext") 
    File "C:\Python27\lib\site-packages\win32com\client\dynamic.py", line 505, in __getattr__ 
    ret = self._oleobj_.Invoke(retEntry.dispid,0,invoke_type,1) 
com_error: (-2147467263, 'Not implemented', None, None) 
False 

我已經用VB寫的類似代碼正常工作,所以我猜它與COM接口沒有正確綁定到適當的功能有關? (我的COM知識是不完整的)。

+2

這是否PDF保存有使用權? (根據文檔中的猜測:「Adobe Reader中提供了此方法,用於具有Save使用權的文檔)。 – 2012-02-21 19:13:53

+1

似乎沒有,但我啓用了它們,仍然出現相同的錯誤。另外,我使用的是Adobe Acrobat – Blish 2012-02-22 11:05:53

回答

3

Blish,this thread持有的關鍵解決方案,您正在尋找:https://mail.python.org/pipermail/python-win32/2002-March/000260.html

我承認,上面的帖子是不是最容易找到(可能因爲Google根據內容的年齡將它評分爲低)。

具體來說,將建議的this piece將獲得運行你的東西:https://mail.python.org/pipermail/python-win32/2002-March/000265.html

僅供參考,代碼的完整片,不需要你手動補丁dynamic.py(段應運行非常出來的盒):

# gets all files under ROOT_INPUT_PATH with FILE_EXTENSION and tries to extract text from them into ROOT_OUTPUT_PATH with same filename as the input file but with INPUT_FILE_EXTENSION replaced by OUTPUT_FILE_EXTENSION 
from win32com.client import Dispatch 
from win32com.client.dynamic import ERRORS_BAD_CONTEXT 

import winerror 

# try importing scandir and if found, use it as it's a few magnitudes of an order faster than stock os.walk 
try: 
    from scandir import walk 
except ImportError: 
    from os import walk 

import fnmatch 

import sys 
import os 

ROOT_INPUT_PATH = None 
ROOT_OUTPUT_PATH = None 
INPUT_FILE_EXTENSION = "*.pdf" 
OUTPUT_FILE_EXTENSION = ".txt" 

def acrobat_extract_text(f_path, f_path_out, f_basename, f_ext): 
    avDoc = Dispatch("AcroExch.AVDoc") # Connect to Adobe Acrobat 

    # Open the input file (as a pdf) 
    ret = avDoc.Open(f_path, f_path) 
    assert(ret) # FIXME: Documentation says "-1 if the file was opened successfully, 0 otherwise", but this is a bool in practise? 

    pdDoc = avDoc.GetPDDoc() 

    dst = os.path.join(f_path_out, ''.join((f_basename, f_ext))) 

    # Adobe documentation says "For that reason, you must rely on the documentation to know what functionality is available through the JSObject interface. For details, see the JavaScript for Acrobat API Reference" 
    jsObject = pdDoc.GetJSObject() 

    # Here you can save as many other types by using, for instance: "com.adobe.acrobat.xml" 
    jsObject.SaveAs(dst, "com.adobe.acrobat.accesstext") 

    pdDoc.Close() 
    avDoc.Close(True) # We want this to close Acrobat, as otherwise Acrobat is going to refuse processing any further files after a certain threshold of open files are reached (for example 50 PDFs) 
    del pdDoc 

if __name__ == "__main__": 
    assert(5 == len(sys.argv)), sys.argv # <script name>, <script_file_input_path>, <script_file_input_extension>, <script_file_output_path>, <script_file_output_extension> 

    #$ python get.txt.from.multiple.pdf.py 'C:\input' '*.pdf' 'C:\output' '.txt' 

    ROOT_INPUT_PATH = sys.argv[1] 
    INPUT_FILE_EXTENSION = sys.argv[2] 
    ROOT_OUTPUT_PATH = sys.argv[3] 
    OUTPUT_FILE_EXTENSION = sys.argv[4] 

    # tuples are of schema (path_to_file, filename) 
    matching_files = ((os.path.join(_root, filename), os.path.splitext(filename)[0]) for _root, _dirs, _files in walk(ROOT_INPUT_PATH) for filename in fnmatch.filter(_files, INPUT_FILE_EXTENSION)) 

    # patch ERRORS_BAD_CONTEXT as per https://mail.python.org/pipermail/python-win32/2002-March/000265.html 
    global ERRORS_BAD_CONTEXT 
    ERRORS_BAD_CONTEXT.append(winerror.E_NOTIMPL) 

    for filename_with_path, filename_without_extension in matching_files: 
     print "Processing '{}'".format(filename_without_extension) 
     acrobat_extract_text(filename_with_path, ROOT_OUTPUT_PATH, filename_without_extension, OUTPUT_FILE_EXTENSION) 

我已經WinPython 64 2.7.6.3測試了這個,的Acrobat X PRO

+1

將winerror.E_NOTIMPL添加到dynamic.py的ERRORS_BAD_CONTEXT列表中 非常感謝! – Blish 2014-10-29 09:23:25

+1

嗨,我正在使用python和acrobat reader pro來執行相同的功能,而且目前這段代碼甚至在做了以前的評論者做了什麼,給了我以下錯誤:「NotAllowedError:安全設置阻止訪問此屬性或方法」。你知道是什麼導致它嗎?謝謝 – dasen 2014-11-13 15:21:07

+2

我不能滿足你'ERRORS_BAD_CONTEXT.append winerror.E_NOTIMPL)'線。 – Fenikso 2014-11-14 08:20:40

1

makepy.py是win32com python包的一個腳本。

運行它爲您的安裝「連線」python到Windows中的COM/OLE對象。以下是我用來與Excel交談並在其中執行某些操作的代碼的摘錄。本示例獲取當前工作簿中工作表1的名稱。如果它有一個例外,它會自動運行makepy:

import win32com; 
import win32com.client; 
from win32com.client import selecttlb; 

def attachExcelCOM(): 
    makepyExe = r'python C:\Python25\Lib\site-packages\win32com\client\makepy.py'; 
    typeList = selecttlb.EnumTlbs(); 
    for tl in typeList: 
     if (re.match('^Microsoft.*Excel.*', tl.desc, re.IGNORECASE)): 
      makepyCmd = "%s -d \"%s\"" % (makepyExe, tl.desc); 
      os.system(makepyCmd); 
     # end if 
    # end for 
# end def 

def getSheetName(sheetNum): 
    try: 
     xl = win32com.client.Dispatch("Excel.Application"); 
     wb = xl.Workbooks.Item(sheetNum); 
    except Exception, detail: 
     print 'There was a problem attaching to Excel, refreshing connect config...'; 
     print Exception, str(detail); 
     attachExcelCOM(); 
     try: 
     xl = win32com.client.Dispatch("Excel.Application"); 
     wb = xl.Workbooks.Item(sheetNum); 
     except: 
     print 'Could not attach to Excel...'; 
     sys.exit(-1); 
     # end try/except 
    # end try/except 

    wsName = wb.Name; 
    if (wsName == 'PERSONAL.XLS'): 
     return(None); 
    # end if 
    print 'The target worksheet is:'; 
    print '  ', wsName; 
    print 'Is this correct? [Y/N]',; 
    answer = string.strip(sys.stdin.readline()); 
    answer = answer.upper(); 
    if (answer != 'Y'): 
     print 'Sheet not identified correctly.'; 
     return(None); 
    # end if 
    return((wb, wsName)); 
# end def 

# -- Main -- 
sheetInfo = getSheetName(sheetNum); 
if (sheetInfo == None): 
    print 'Sheet not found'; 
    sys.exit(-1); 
else: 
    (wb, wsName) = sheetInfo; 
# end if 
相關問題