2016-07-05 44 views
0

我正在使用VBA自動將PDF文件中的文本提取到xls電子表格。VBA - 在PDF文件中查找文本,將其複製並粘貼到電子表格中

文本總是相同的「價格X」,「價格Y」,「價格Z」

我需要在電子表格中查找,複製和粘貼它們。

有沒有人有我如何在VBA自動化這個過程的想法?

我還沒有找到相關的主題。

+0

有相當多的信息在那裏完成一個PDF文本搜索。看看[這裏](https://acrobatusers.com/forum/general-acrobat-topics/search-text-pdf-vba-only-adobe-reader-installed/),[這裏](http://www.mrexcel .com/forum/excel-questions/613460-searching-pdf-using-excel-2010-visual-basic-applications.html)和[here](http://www.myengineeringworld.net/2014/05/pdf -search貫通vba.html)。爲了獲得更具體的幫助,您必須編寫自己的代碼併發布,並詢問有關不起作用的特定部分。 – PeterT

+0

這將是我第一次自動化PDF/VBA。到現在我還沒有嘗試過任何具體的東西。我要檢查鏈接,並寫下我的代碼。 –

+0

在VBA項目中應該激活哪些引用? –

回答

0

我認爲你最好的選擇是將PDF轉換爲文本文件(另存爲文本文件)並將文本文件導入Excel。

你可以谷歌如何做到這一點;這很容易,這將是一個偉大的學習練習。如果您還有其他問題,請發回。

+0

我認爲這是個好主意。唯一的問題是它必須轉換的pdf文件數量(大約400),並將文本導入到excel中。我會谷歌如何做到這一點,我會嘗試找到一個easiaer方法。 –

+0

我一直在研究API,這是一種非常困難的編碼方法。我仍然認爲API可以輕鬆地將文本導入到Excel中。 –

+0

我仍然認爲API可以使用FindWindow,SetForegroundWindow,SendMessage和PostMessage函數輕鬆地將文本導入到excel中。你有一個在PDF文件中工作的函數的例子嗎?對於額外評論感到抱歉。我無法編輯前一個。 –

0

如果您安裝了Adobe Acrobat,您可以將所有PDF文件轉換爲Excel文件。

除了主要程序,我還寫了一個循環,以便一次轉換多個PDF文件。所以,如果你有一個包含PDF文件的文件夾,你可以使用這個工具來獲取他們的文件路徑。然後,您可以使用附加的工作簿將其轉換爲不同的格式。該代碼實際上使用Adobe Professional的「另存爲」命令以將文件保存爲所需的格式。可用的格式是:

eps 
html and htm 
jpeg, jpg and jpe 
jpf, jpx, jp2, j2k, j2c and jpc 
docx 
doc 
png 
ps 
rft 
xlsx 
xls 
txt 
tiff and tif 
xml 

VBA代碼

Option Explicit 
Option Private Module 

Sub SavePDFAsOtherFormat(PDFPath As String, FileExtension As String) 

    'Saves a PDF file as another format using Adobe Professional. 

    'By Christos Samaras 
    'http://www.myengineeringworld.net 

    'In order to use the macro you must enable the Acrobat library from VBA editor: 
    'Go to Tools -> References -> Adobe Acrobat xx.0 Type Library, where xx depends 
    'on your Acrobat Professional version (i.e. 9.0 or 10.0) you have installed to your PC. 

    'Alternatively you can find it Tools -> References -> Browse and check for the path 
    'C:\Program Files\Adobe\Acrobat xx.0\Acrobat\acrobat.tlb 
    'where xx is your Acrobat version (i.e. 9.0 or 10.0 etc.). 

    Dim objAcroApp  As Acrobat.AcroApp 
    Dim objAcroAVDoc As Acrobat.AcroAVDoc 
    Dim objAcroPDDoc As Acrobat.AcroPDDoc 
    Dim objJSO   As Object 
    Dim boResult  As Boolean 
    Dim ExportFormat As String 
    Dim NewFilePath  As String 

    'Check if the file exists. 
    If Dir(PDFPath) = "" Then 
     MsgBox "Cannot find the PDF file!" & vbCrLf & "Check the PDF path and retry.", _ 
       vbCritical, "File Path Error" 
     Exit Sub 
    End If 

    'Check if the input file is a PDF file. 
    If LCase(Right(PDFPath, 3)) <> "pdf" Then 
     MsgBox "The input file is not a PDF file!", vbCritical, "File Type Error" 
     Exit Sub 
    End If 

    'Initialize Acrobat by creating App object. 
    Set objAcroApp = CreateObject("AcroExch.App") 

    'Set AVDoc object. 
    Set objAcroAVDoc = CreateObject("AcroExch.AVDoc") 

    'Open the PDF file. 
    boResult = objAcroAVDoc.Open(PDFPath, "") 

    'Set the PDDoc object. 
    Set objAcroPDDoc = objAcroAVDoc.GetPDDoc 

    'Set the JS Object - Java Script Object. 
    Set objJSO = objAcroPDDoc.GetJSObject 

    'Check the type of conversion. 
    Select Case LCase(FileExtension) 
     Case "eps": ExportFormat = "com.adobe.acrobat.eps" 
     Case "html", "htm": ExportFormat = "com.adobe.acrobat.html" 
     Case "jpeg", "jpg", "jpe": ExportFormat = "com.adobe.acrobat.jpeg" 
     Case "jpf", "jpx", "jp2", "j2k", "j2c", "jpc": ExportFormat = "com.adobe.acrobat.jp2k" 
     Case "docx": ExportFormat = "com.adobe.acrobat.docx" 
     Case "doc": ExportFormat = "com.adobe.acrobat.doc" 
     Case "png": ExportFormat = "com.adobe.acrobat.png" 
     Case "ps": ExportFormat = "com.adobe.acrobat.ps" 
     Case "rft": ExportFormat = "com.adobe.acrobat.rft" 
     Case "xlsx": ExportFormat = "com.adobe.acrobat.xlsx" 
     Case "xls": ExportFormat = "com.adobe.acrobat.spreadsheet" 
     Case "txt": ExportFormat = "com.adobe.acrobat.accesstext" 
     Case "tiff", "tif": ExportFormat = "com.adobe.acrobat.tiff" 
     Case "xml": ExportFormat = "com.adobe.acrobat.xml-1-00" 
     Case Else: ExportFormat = "Wrong Input" 
    End Select 

    'Check if the format is correct and there are no errors. 
    If ExportFormat <> "Wrong Input" And Err.Number = 0 Then 

     'Format is correct and no errors. 

     'Set the path of the new file. Note that Adobe instead of xls uses xml files. 
     'That's why here the xls extension changes to xml. 
     If LCase(FileExtension) <> "xls" Then 
      NewFilePath = WorksheetFunction.Substitute(PDFPath, ".pdf", "." & LCase(FileExtension)) 
     Else 
      NewFilePath = WorksheetFunction.Substitute(PDFPath, ".pdf", ".xml") 
     End If 

     'Save PDF file to the new format. 
     boResult = objJSO.SaveAs(NewFilePath, ExportFormat) 

     'Close the PDF file without saving the changes. 
     boResult = objAcroAVDoc.Close(True) 

     'Close the Acrobat application. 
     boResult = objAcroApp.Exit 

     'Inform the user that conversion was successfully. 
     MsgBox "The PDf file:" & vbNewLine & PDFPath & vbNewLine & vbNewLine & _ 
     "Was saved as: " & vbNewLine & NewFilePath, vbInformation, "Conversion finished successfully" 

    Else 

     'Something went wrong, so close the PDF file and the application. 

     'Close the PDF file without saving the changes. 
     boResult = objAcroAVDoc.Close(True) 

     'Close the Acrobat application. 
     boResult = objAcroApp.Exit 

     'Inform the user that something went wrong. 
     MsgBox "Something went wrong!" & vbNewLine & "The conversion of the following PDF file FAILED:" & _ 
     vbNewLine & PDFPath, vbInformation, "Conversion failed" 

    End If 

    'Release the objects. 
    Set objAcroPDDoc = Nothing 
    Set objAcroAVDoc = Nothing 
    Set objAcroApp = Nothing 

End Sub 

在這裏的是,通過被包含在片材「路徑」的「B」列中的所有文件的路徑循環和PDF轉換宏文件轉換爲不同的文件類型。宏ExportAllPDFs使用SavePDFAsOtherFormatNoMsg宏,與SavePDFAsOtherFormat宏類似,但沒有消息框。

Sub ExportAllPDFs() 

    'Convert all the PDF files that their paths are on column B of 
    'the worksheet "Paths" into a different file format. 
    'By Christos Samaras 
    'http://www.myengineeringworld.net 

    Dim FileFormat As String 
    Dim LastRow As Long 
    Dim i As Integer 

    'Change this according to your own needs. 
    'Available formats: eps html, htm jpeg, jpg, jpe jpf, jpx, jp2, 
    'j2k, j2c, jpc, docx, doc, png, ps, rft, xlsx, xls, txt, tiff, tif and xml. 
    'In this example the PDF file will be saved as text file. 
    FileFormat = "txt" 

    If FileFormat = "" Then 
     shPaths.Range("B2").Select 
     MsgBox "There are no file paths to convert!", vbInformation, "File paths missing" 
     Exit Sub 
    End If 

    shPaths.Activate 

    'Find the last row. 
    With shPaths 
     LastRow = .Cells(.Rows.Count, "B").End(xlUp).Row 
    End With 

    'Check that there are available file paths. 
    If LastRow < 2 Then 
     shPaths.Range("B2").Select 
     MsgBox "There are no file paths to convert!", vbInformation, "File paths missing" 
     Exit Sub 
    End If 

    'For each cell in the range "B2:B" & last row convert the pdf file 
    'into different format (here to text - txt). 
    For i = 2 To LastRow 
     SavePDFAsOtherFormatNoMsg Cells(i, 2).Value, FileFormat 
    Next i 

    'Inform the user that conversion finished. 
    MsgBox "All files were converted successfully!", vbInformation, "Finished" 

End Sub 

http://www.myengineeringworld.net/2013/03/vba-macro-to-convert-pdf-files-into.html

相關問題