2012-07-05 51 views
7

我的.Net應用程序需要以編程方式將PDF文檔轉換爲Word格式。如何使用Acrobat SDK將PDF轉換爲Word?

我評估了多個產品後發現Acrobat X Pro,這給另存爲選項,我們可以保存爲Word/Excel格式文檔。我試圖使用Acrobat SDK,但從哪裏開始找不到合適的文檔。

我看着他們的IAC樣本但不明白如何調用菜單項,使之執行保存的選項。

回答

0

Adob​​e不支持PDF到Word轉換,除非你使用他們的Acrobat PDF格式的客戶端。 Maeaning你不能用他們的SDK或通過調用命令行來完成它。你只能手動完成。

+0

發表任何JLE的解決方案,或者我帶以編程方式實現這一目標。如果你安裝Acrobat X Pro,您可以嘗試一下我的腳本,它應該工作開箱即用,一旦你已經安裝了WinPython 64 2.7.6.3(這是免費的) – Subhobroto 2014-10-28 04:13:10

13

您可以使用Acrobat X PRO做到這一點,但你需要使用JavaScript API在C#。

AcroPDDoc pdfd = new AcroPDDoc(); 
pdfd.Open(sourceDoc.FileFullPath); 
Object jsObj = pdfd.GetJSObject(); 
Type jsType = pdfd.GetType(); 
//have to use acrobat javascript api because, acrobat 
object[] saveAsParam = { "newFile.doc", "com.adobe.acrobat.doc", "", false, false }; 
jsType.InvokeMember("saveAs",BindingFlags.InvokeMethod | BindingFlags.Public | BindingFlags.Instance,null, jsObj, saveAsParam, CultureInfo.InvariantCulture); 

希望有所幫助。

+0

您好,我有不一樣的東西..謝謝你的回答。但看起來這個過程需要很長時間才能完成。如果我必須覆蓋1000個文件,則需要超過5個6小時..有沒有更快的方法呢? – 2016-04-04 10:04:37

+0

我在最後添加了pdfd.Close()來解鎖文件。 – r03 2017-06-26 14:05:50

1

我做了非常相似的使用WinPython 64 2.7.6.3和Acrobat X Pro的東西,並用JSObject接口PDF文件轉換爲DOCX。基本上與jle's相同的解決方案。

下面列出的是代碼的完整片一組的PDF轉換爲DOCX:

# gets all files under ROOT_INPUT_PATH with FILE_EXTENSION and tries to extract text from them into ROOT_OUTPUT_PATH with same filename as the input file but with INPUT_FILE_EXTENSION replaced by OUTPUT_FILE_EXTENSION 
from win32com.client import Dispatch 
from win32com.client.dynamic import ERRORS_BAD_CONTEXT 

import winerror 

# try importing scandir and if found, use it as it's a few magnitudes of an order faster than stock os.walk 
try: 
    from scandir import walk 
except ImportError: 
    from os import walk 

import fnmatch 

import sys 
import os 

ROOT_INPUT_PATH = None 
ROOT_OUTPUT_PATH = None 
INPUT_FILE_EXTENSION = "*.pdf" 
OUTPUT_FILE_EXTENSION = ".docx" 

def acrobat_extract_text(f_path, f_path_out, f_basename, f_ext): 
    avDoc = Dispatch("AcroExch.AVDoc") # Connect to Adobe Acrobat 

    # Open the input file (as a pdf) 
    ret = avDoc.Open(f_path, f_path) 
    assert(ret) # FIXME: Documentation says "-1 if the file was opened successfully, 0 otherwise", but this is a bool in practise? 

    pdDoc = avDoc.GetPDDoc() 

    dst = os.path.join(f_path_out, ''.join((f_basename, f_ext))) 

    # Adobe documentation says "For that reason, you must rely on the documentation to know what functionality is available through the JSObject interface. For details, see the JavaScript for Acrobat API Reference" 
    jsObject = pdDoc.GetJSObject() 

    # Here you can save as many other types by using, for instance: "com.adobe.acrobat.xml" 
    jsObject.SaveAs(dst, "com.adobe.acrobat.docx") # NOTE: If you want to save the file as a .doc, use "com.adobe.acrobat.doc" 

    pdDoc.Close() 
    avDoc.Close(True) # We want this to close Acrobat, as otherwise Acrobat is going to refuse processing any further files after a certain threshold of open files are reached (for example 50 PDFs) 
    del pdDoc 

if __name__ == "__main__": 
    assert(5 == len(sys.argv)), sys.argv # <script name>, <script_file_input_path>, <script_file_input_extension>, <script_file_output_path>, <script_file_output_extension> 

    #$ python get.docx.from.multiple.pdf.py 'C:\input' '*.pdf' 'C:\output' '.docx' # NOTE: If you want to save the file as a .doc, use '.doc' instead of '.docx' here and ensure you use "com.adobe.acrobat.doc" in the jsObject.SaveAs call 

    ROOT_INPUT_PATH = sys.argv[1] 
    INPUT_FILE_EXTENSION = sys.argv[2] 
    ROOT_OUTPUT_PATH = sys.argv[3] 
    OUTPUT_FILE_EXTENSION = sys.argv[4] 

    # tuples are of schema (path_to_file, filename) 
    matching_files = ((os.path.join(_root, filename), os.path.splitext(filename)[0]) for _root, _dirs, _files in walk(ROOT_INPUT_PATH) for filename in fnmatch.filter(_files, INPUT_FILE_EXTENSION)) 

    # patch ERRORS_BAD_CONTEXT as per https://mail.python.org/pipermail/python-win32/2002-March/000265.html 
    global ERRORS_BAD_CONTEXT 
    ERRORS_BAD_CONTEXT.append(winerror.E_NOTIMPL) 

    for filename_with_path, filename_without_extension in matching_files: 
     print "Processing '{}'".format(filename_without_extension) 
     acrobat_extract_text(filename_with_path, ROOT_OUTPUT_PATH, filename_without_extension, OUTPUT_FILE_EXTENSION) 
相關問題