2011-08-27 33 views

回答

3

由OneNote的OCR識別的文本存儲在之一:OneNote中XML文件結構中的OCRText元素。例如

<one:Page ...> 
    ... 
    <one:Image ...> 
     ... 
     <one:OCRData lang="en-US"> 
      <one:OCRText><![CDATA[This is some sampletext]]></one:OCRText> 
     </one:OCRData> 
    </one:Image> 
</one:Page> 

您可以看到使用了一個名爲OMSPY程序這個XML(它顯示了你的OneNote頁面背後的XML) - http://blogs.msdn.com/b/johnguin/archive/2011/07/28/onenote-spy-omspy-for-onenote-2010.aspx

要提取您可以使用OneNote的COM互操作的文本(正如你所指出) 。例如

//Instantialize OneNote 
ApplicationClass onApp = new ApplicationClass(); 

//Get the XMl from the selected page 
string xml = ""; 
onApp.GetPageContent("put the page id here", out xml); 

//Put it into an XML document (from System.XML.Linq) 
XDocument xDoc = XDocument.Parse(xml); 

//OneNote's Namespace - for OneNote 2010 
XNamespace one = "http://schemas.microsoft.com/office/onenote/2010/onenote"; 

//Get all the OCRText from the page 
string[] OCRText = xDoc.Descendants(one + "OCRText").Select(x => x.Value).ToArray(); 

參見 「應用程序接口」 文檔在MSDN上獲取更多信息 - http://msdn.microsoft.com/en-us/library/gg649853.aspx