2011-07-19 44 views
1

我試圖在Word文檔中從多個中提取數據。當試圖將表中的數據轉換爲文本時,我得到一個錯誤。該ConvertToText方法有兩個可選參數(如何單獨的數據,和一個布爾值)。這裏是我當前的代碼:從Word中的表中提取原始數據?使用Perl

#usr/bin/perl 
#OLEWord.pl 

#Use string and print warnings 
use strict;use warnings; 
#Using OLE + OLE constants for Variants and OLE enumeration for Enumerations 
use Win32::OLE qw(in); 
use Win32::OLE::Const 'Microsoft Word'; 
use Win32::OLE::Variant; 

my $var1 = Win32::OLE::Variant->new(VT_BOOL, 'true'); 

$Win32::OLE::Warn = 3; 

#set the file to be opened 
my $file = 'C:\work\SCL_International Financial New Fund Setup Questionnaire V1.6.docx'; 

#Create a new instance of Win32::OLE for the Word application, die if could not open the application 
my $MSWord = Win32::OLE->GetActiveObject('Excel.Application') or Win32::OLE->new('Word.Application','Quit'); 

#Set the screen to Visible, so that you can see what is going on 
$MSWord->{'Visible'} = 1; 
$MSWord->{'DisplayAlerts'} = 0; #Supress Alerts, such as 'Save As....' 

#open the request file or die and print warning message 
my $Doc = $MSWord->{'Documents'}->Open($file) or die "Could not open ", $file, " Error:", Win32::OLE->LastError(); 

#$MSWord->ActiveDocument->SaveAs({Filename => 'AlteredTest.docx', 
          #FileFormat => wdFormatDocument}); 

my $tables = $MSWord->ActiveDocument->{'Tables'}; 

for my $table (in $tables){ 
    my $tableText = $table->ConverToText(wdSeparateByParagraphs,$var1); 
    print "Table: ", $tableText, "\n"; 
} 


$MSWord->ActiveDocument->Close; 
$MSWord->Quit; 

,我得到這個錯誤:

Bareword "VT_BOOL" not allowed while "strict subs" in use at OLEWord.pl line 31
Bareword "true" not allowed while "strict subs" in use at OLEWord.pl line 31
Execution of OLEWord.pl aborted due to compilation errors.

回答

3

當事情像VT_BOOL沒有被定義爲常數,perl的將裸字考慮。其他人已經提供了他們的信息。

問題的根本原因是缺少由Win32::OLE::Variant模塊導出的常量。添加:

use Win32::OLE::Variant; 

刪除第一個錯誤的腳本。第二個是類似的問題,true也沒有定義。與1更換或用常量定義自己:

use constant true => 1; 

編輯:這裏是例如提取表文本:

my $tables = $MSWord->ActiveDocument->{'Tables'}; 
for my $table (in $tables){ 
    my $tableText = $table->ConvertToText({ Separator => wdSeparateByTabs }); 
    print "Table: ", $tableText->Text(), "\n"; 
} 

在你的代碼中有方法名ConverToText錯字。此方法返回Range對象,因此您必須使用Text方法來獲取實際文本。

+0

是的,我忘了那個謝謝,但是如何從Word> – Shahab

+0

@Shahab表格中提取數據 - 請參閱我的表格提取代碼的更新答案。 – bvr

+0

嗯,我得到一個錯誤正在運行:從「Microsoft Word」OLE異常此方法或屬性不可用,因爲部分或全部數據不參考表 - >在ConverToText。 – Shahab

1

移除「使用嚴格的」將刪除「裸詞」錯誤

2

A 'Bareword' error is caused by a syntax error in your code. A 'runaway multi-line' usually pinpoints where the start of the error is, and usually means that a line has not been completed, often because of mismatched brackets or quote marks.

As has been pointed out by several SO-ers, that doesn't look like Perl! The Perl interpreter is balking on a syntax error because it doesn't speak that particular language! Source

未使用嚴格,不會給你的警告。 (但你應該使用它來獲得一個好的代碼)

閱讀關於Bareword,以便你會知道他們是什麼,你會通過你自己知道如何糾正這個錯誤。

下面是一些研究有關的鏈接裸詞: 1. perl.com 2. alumnus

+0

感謝,有關如何從表中提取的數據是什麼?代碼看起來是否正確? – Shahab

0

提取所有文檔表到一個xls文件

 sub doParseDoc { 

      my $msg  = '' ; 
      my $ret  = 1 ; # assume failure at the beginning ... 

      $msg  = 'START --- doParseDoc' ; 
      $objLogger->LogDebugMsg($msg); 
      $msg  = 'using the following DocFile: "' . $DocFile . '"' ; 
      $objLogger->LogInfoMsg($msg); 
      #----------------------------------------------------------------------- 
      #Using OLE + OLE constants for Variants and OLE enumeration for Enumerations 


      # Create a new Excel workbook 
      my $objWorkBook = Spreadsheet::WriteExcel->new("$DocFile" . '.xls'); 

      # Add a worksheet 
      my $objWorkSheet = $objWorkBook->add_worksheet(); 


      my $var1 = Win32::OLE::Variant->new(VT_BOOL, 'true'); 

      Win32::OLE->Option(Warn => \&Carp::croak); 
      use constant true => 0; 

      # at this point you should have the Word application opened in UI with t 
      # the DocFile 
      # build the MS Word object during run-time 
      my $objMSWord = Win32::OLE->GetActiveObject('Word.Application') 
          or Win32::OLE->new('Word.Application', 'Quit'); 

      # build the doc object during run-time 
      my $objDoc = $objMSWord->Documents->Open($DocFile) 
       or die "Could not open ", $DocFile, " Error:", Win32::OLE->LastError(); 

      #Set the screen to Visible, so that you can see what is going on 
      $objMSWord->{'Visible'} = 1; 
      # try NOT printing directly to the file 


      #$objMSWord->ActiveDocument->SaveAs({Filename => 'AlteredTest.docx', 
             #FileFormat => wdFormatDocument}); 

      my $tables  = $objMSWord->ActiveDocument->Tables(); 
      my $tableText  = '' ; 
      my $xlsRow  = 1 ; 

      for my $table (in $tables){ 
       # extract the table text as a single string 
       #$tableText = $table->ConvertToText({ Separator => 'wdSeparateByTabs' }); 
       # cheated those properties from here: 
       # https://msdn.microsoft.com/en-us/library/aa537149(v=office.11).aspx#officewordautomatingtablesdata_populateatablewithdata 
       my $RowsCount = $table->{'Rows'}->{'Count'} ; 
       my $ColsCount = $table->{'Columns'}->{'Count'} ; 

       # disgard the tables having different than 5 columns count 
       next unless ($ColsCount == 5) ; 

       $msg   = "Rows Count: $RowsCount " ; 
       $msg   .= "Cols Count: $ColsCount " ; 
       $objLogger->LogDebugMsg ($msg) ; 

       #my $tableRange = $table->ConvertToText({ Separator => '##' }); 
       # OBS !!! simple print WILL print to your doc file use Select ?! 
       #$objLogger->LogDebugMsg ($tableRange . "\n"); 
       # skip the header row 
       foreach my $row (0..$RowsCount) { 
       foreach my $col (0..$ColsCount) { 

        # nope ... $table->cell($row,$col)->->{'WrapText'} = 1 ; 
        # nope $table->cell($row,$col)->{'WordWrap'} = 1 ; 
        # so so $table->cell($row,$col)->WordWrap() ; 

        my $txt = ''; 
        # well some 1% of the values are so nasty that we really give up on them ... 
        eval { 
         $txt = $table->cell($row,$col)->range->{'Text'}; 
         #replace all the ctrl chars by space 
         $txt =~ s/\r/ /g ; 
         $txt =~ s/[^\040-\176]/ /g ; 
         # perform some cleansing - ColName<primary key>=> ColName 
         #$txt =~ s#^(.[a-zA-Z_0-9]*)(\<.*)#$1#g ; 

         # this will most probably brake your cmd ... 
         # $objLogger->LogDebugMsg ("row: $row , col: $col with txt: $txt \n") ; 
        } or $txt = 'N/A' ; 

        # Write a formatted and unformatted string, row and column notation. 
        $objWorkSheet->write($xlsRow, $col, $txt); 

       } #eof foreach col 

       # we just want to dump all the tables into the one sheet 
       $xlsRow++ ; 
       } #eof foreach row 
       sleep 1 ; 
      } #eof foreach table 

      # close the opened in the UI document 
      $objMSWord->ActiveDocument->Close; 

      # OBS !!! now we are able to print 
      $objLogger->LogDebugMsg ($tableText . "\n"); 

      # exit the whole Word application 
      $objMSWord->Quit; 

      return ($ret , $msg) ; 
    } 
    #eof sub doParseDoc 
相關問題