2012-08-01 102 views

回答

0

不容易在Word文檔中結束,雖然Word創建以w文件:lastRenderedPageBreak。

最好讓您的OCR程序在每個已轉換文本塊之間的文檔中插入一些標記。

然後,根據它是什麼類型的Word文檔,使用適當的工具處理該文件。

3

如果您安裝了Word,則可以使用Word對象模型從C#處理Word文檔。

首先,添加對Word對象模型的引用。右鍵點擊該項目,然後Add Reference... -> COM -> Microsoft Word 14.0 Object Model(或類似的,取決於您的Word版本)。

然後,您可以使用下面的代碼:

using Microsoft.Office.Interop.Word; 
//for older versions of Word use: 
//using Word; 

namespace WordSplitter { 
    class Program { 
     static void Main(string[] args) { 
      //Create a new instance of Word 
      var app = new Application(); 

      //Show the Word instance. 
      //If the code runs too slowly, you can show the application at the end of the program 
      //Make sure it works properly first; otherwise, you'll get an error in a hidden window 
      //(If it still runs too slowly, there are a few other ways to reduce screen updating) 
      app.Visible = true; 

      //We need a reference to the source document 
      //It should be possible to get a reference to an open Word document, but I haven't tried it 
      var doc = app.Documents.Open(@"path\to\file.doc"); 
      //(Can also use .docx) 

      int pageCount = doc.Range().Information[WdInformation.wdNumberOfPagesInDocument]; 

      //We'll hold the start position of each page here 
      int pageStart = 0; 

      for (int currentPageIndex = 1; currentPageIndex <= pageCount; currentPageIndex++) { 
       //This Range object will contain each page. 
       var page = doc.Range(pageStart); 

       //Generally, the end of the current page is 1 character before the start of the next. 
       //However, we need to handle the last page -- since there is no next page, the 
       //GoTo method will move to the *start* of the last page. 
       if (currentPageIndex < pageCount) { 
        //page.GoTo returns a new Range object, leaving the page object unaffected 
        page.End = page.GoTo(
         What: WdGoToItem.wdGoToPage, 
         Which: WdGoToDirection.wdGoToAbsolute, 
         Count: currentPageIndex + 1 
        ).Start - 1; 
       } else { 
        page.End = doc.Range().End; 
       } 
       pageStart = page.End + 1; 

       //Copy and paste the contents of the Range into a new document 
       page.Copy(); 
       var doc2 = app.Documents.Add(); 
       doc2.Range().Paste(); 
      } 
     } 
    } 
} 

參考:Word Object Model Overview on MSDN

+0

感謝親愛@ZevSpitz – Iman 2012-08-03 08:11:23

+0

這是一個完美的出發點,以創造一些有用的。 – 2012-10-16 15:12:45

4

other answer,但有一個IEnumerator和擴展方法的文檔。

static class PagesExtension { 
    public static IEnumerable<Range> Pages(this Document doc) { 
     int pageCount = doc.Range().Information[WdInformation.wdNumberOfPagesInDocument]; 
     int pageStart = 0; 
     for (int currentPageIndex = 1; currentPageIndex <= pageCount; currentPageIndex++) { 
      var page = doc.Range(
       pageStart 
      ); 
      if (currentPageIndex < pageCount) { 
       //page.GoTo returns a new Range object, leaving the page object unaffected 
       page.End = page.GoTo(
        What: WdGoToItem.wdGoToPage, 
        Which: WdGoToDirection.wdGoToAbsolute, 
        Count: currentPageIndex+1 
       ).Start-1; 
      } else { 
       page.End = doc.Range().End; 
      } 
      pageStart = page.End + 1; 
      yield return page; 
     } 
     yield break; 
    } 
} 

主要的代碼最終是這樣的:

static void Main(string[] args) { 
    var app = new Application(); 
    app.Visible = true; 
    var doc = app.Documents.Open(@"path\to\source\document"); 
    foreach (var page in doc.Pages()) { 
     page.Copy(); 
     var doc2 = app.Documents.Add(); 
     doc2.Range().Paste(); 
    } 
} 
相關問題