如何使用JavaScript從Word文檔中提取圖像？

我正嘗試從JavaScript文件中使用ActiveXObject提取圖像（僅限IE）。如何使用JavaScript從Word文檔中提取圖像？

我無法找到Word對象的任何API參考，只能從圍繞互聯網的幾個提示：

var filename = 'path/to/word/doc.docx' 
var word = new ActiveXObject('Word.Application') 
var doc = w.Documents.Open(filename) 
// Displays the text 
var docText = doc.Content

我怎麼會在使用類似的doc.Content Word文檔訪問圖像？

此外，如果任何人有一個確切的來源（最好來自微軟）的API是非常有用的。

來源

2013-03-04 CSamp

http://msdn.microsoft.com/en-us/office/aa905496.aspx – 2013-03-04 23:28:24

肯，謝謝你的這個鏈接！我知道它在某個地方，但在我的生活中找不到它。我會看看我能否找到這個問題的答案。 – CSamp 2013-03-04 23:42:35

因此經過幾周的研究，我發現通過使用屬於Word ActiveXObject的SaveAs函數來提取圖像將是最容易的。如果該文件保存爲HTML文檔，則Word將創建一個包含圖像的文件夾。從那裏，你可以使用XMLHttp抓取HTML文件，並創建新的IMG標籤，可以通過瀏覽器查看（我使用的是IE（9），因爲ActiveXObject只能在Internet Explorer中使用）。

讓我們從SaveAs部分開始：

// Define the path to the file 
var filepath = 'path/to/the/word/doc.docx' 
// Make a new ActiveXWord application 
var word = new ActiveXObject('Word.Application') 
// Open the document 
var doc = word.Documents.Open(filepath) 
// Save the DOCX as an HTML file (the 8 specifies you want to save it as an HTML document) 
doc.SaveAs(filepath + '.htm', 8)

現在我們應該有一個文件夾中，在他們的圖像文件在同一目錄。

注意：在Word HTML中，圖像使用<v:imagedata>標籤，這些標籤存儲在<v:shape>標籤中;例如：

<v:shape style="width: 241.5pt; height: 71.25pt;"> 
    <v:imagedata src="path/to/the/word/doc.docx_files/image001.png"> 
     ... 
    </v:imagedata> 
</v:shape>

我已經去除了多餘的屬性和標籤，Word保存。

要使用JavaScript訪問HTML，請使用XMLHttpRequest對象。

var xmlhttp = new XMLHttpRequest() 
var html_text = ""

因爲我訪問數百個Word文檔中，我發現它是最好的定義的XMLHTTP的onreadystatechange回調之前發送呼叫。

// Define the onreadystatechange callback function 
xmlhttp.onreadystatechange = function() { 
    // Check to make sure the response has fully loaded 
    if (xmlhttp.readyState==4 && xmlhttp.status==200) { 
     // Grab the response text 
     var html_text=xmlhttp.responseText 
     // Load the HTML into the innerHTML of a DIV to add the HTML to the DOM 
     document.getElementById('doc_html').innerHTML=html_text.replace("<html>", "").replace("</html>","") 
     // Define a new array of all HTML elements with the "v:imagedata" tag 
     var images =document.getElementById('doc_html').getElementsByTagName("v:imagedata") 
     // Loop through each image 
     for(j=0;j<images.length;j++) { 
      // Grab the source attribute to get the image name 
      var src = images[j].getAttribute('src') 
      // Check to make sure the image has a 'src' attribute 
      if(src!=undefined) { 
       ...

我有很多問題加載，因爲IE逃脫時，它加載它們到innerHTML的doc_html DIV所以在下面的例子中我使用的是僞路徑和src.split('/')[1]它的HTML屬性的方式正確src屬性搶映像名稱（此方法不會，如果有超過1個正斜槓工作！）：

   ... 
       images[j].setAttribute('src', '/path/to/the/folder/containing/the/images/'+src.split('/')[1]) 
       ...

這裏我們使用父母的（在v:shape對象的新img標籤添加到HTML DIV ）父項（恰好是一個p對象）。

   ... 
       images[j].parentElement.parentElement.innerHTML+="<img src='"+images[j].getAttribute('src')+"' style='"+images[j].parentElement.getAttribute('style')+"'>" 

      } 
     }  
    } 
} 
// Read the HTML Document using XMLHttpRequest 
xmlhttp.open("POST", filepath + '.htm', false) 
xmlhttp.send()

雖然這是一個有點特殊，上述方法能夠成功加入IMG標籤：我們通過抓住從圖像中src屬性，並從v:shape元素style信息追加新img標籤的innerHTML到他們在原始文檔中的HTML。

來源

2013-03-08 00:57:39 CSamp

如何使用JavaScript從Word文檔中提取圖像？

回答

相關問題