2010-04-14 193 views
2

有沒有一種方法可以以編程方式訪問Word 2007文檔的文檔屬性?以編程方式訪問Word 2007文檔的文檔屬性

我願意爲此使用任何語言,但理想情況下它可能是通過PowerShell腳本。

我的總體目標是遍歷文件系統上某處的文檔,解析這些文檔的某些文檔屬性,然後將所有這些屬性整理回新的Word文檔。

我基本上想要自動創建一個文件,它是文件系統某個文件夾下所有文件的列表;並且此列表將包含諸如標題,摘要作者文檔屬性; CreateDate字段;等等。

回答

1

我的猜測是你最好的選擇是VB or C#Office Interop Assemblies。我不知道原生的方式(在Powershell內部)做你想做的事。

也就是說,如果你使用VB或C#,你可以編寫一個powershell cmdlet你是什麼樣的排序規則。但就此而言,編寫一個作爲計劃任務運行的控制檯應用程序可能更簡單。

1

我最近從觀看DNRTV節目中瞭解到Office 2007文檔只是壓縮XML。因此,您可以將「Document.docx」更改爲「Document.docx.zip」並查看其中的XML文件。您可能可以通過.NET中的互操作程序集來獲取屬性,但是查看正確的XML可能更有效(可能使用LINQ to XML或我不知道的某種本地方式)。

2

我需要在未安裝MS Office應用程序的服務器上運行的PowerShell中執行此操作。如上所示,訣竅是查看辦公文件並檢查其中的嵌入式xml文件。

這是一個像cmdlet一樣運行的函數,這意味着您可以簡單地將腳本保存到PowerShell腳本目錄中,並從任何其他PowerShell腳本中調用該函數。

# DocumentOfficePropertiesGet 
# Example usage 
# From a PowerShell script: 
#  $props = Invoke-Expression "c:\PowerShellScriptFolder\DocumentOfficePropertiesGet.ps1 -DocumentFullPathName ""d:\documents\my excel doc.xlsx"" -OfficeProperties ""dcterms:created;dcterms:modified""" 

# Parameters 

# DocumentFullPathName -- full path and name of MS Office document 
# OfficeProperties -- semi-colon delimited string of property names as they 
#    appear in the core.xml file. To see these names, rename any 
#    MS Office document file to have the extension .zip, then look inside 
#    the zip file. In the docProps folder open the core.xml file. The 
#    core document properties are nodes under the cp:coreProperties node. 

#   Example: dcterms:created;dcterms:modified;cp:lastModifiedBy 

# Return value 

# The function returns a hashtable object -- in the above example, $props would contain 
# the name-value pairs for the requested MS Office document properties. In the calling script, 
# to get at the values: 

#  $fooProperty = $props.'dcterms:created' 
#  $barProperty = $props.'dcterms:modified' 

[CmdletBinding()] 
    [OutputType([System.Collections.Hashtable])] 
    Param 
    (
     [Parameter(Position=0, 
      Mandatory=$false, 
      HelpMessage="Enter the full path name of the document")] 
      [ValidateNotNullOrEmpty()] 
      [String] $DocumentFullPathName='e:\temp\supplier_List.xlsx', 
     [Parameter(Position=1, 
      Mandatory=$false, 
      HelpMessage="Enter the Office properties semi-colon delimited")] 
      [ValidateNotNullOrEmpty()] 
      [String] $OfficeProperties='dcterms:created; dcterms:modified ;cp:lastModifiedBy;dc:creator' 
    ) 
# We need the FileSystem assembly 
Add-Type -AssemblyName System.IO.Compression.FileSystem 

# This function unzips a zip file -- and it works on MS Office files directly: no need to 
# rename them from foo.xlsx to foo.zip. It expects the full path name of the zip file 
# and the path name for the unzipped files 
function Unzip 
{ 
    param([string]$zipfile, [string]$outpath) 

    [System.IO.Compression.ZipFile]::ExtractToDirectory($zipfile, $outpath) *>$null 
} 

# Remove spaces from the OfficeProperties parameter 
$OfficeProperties = $OfficeProperties.replace(' ','') 

# Compose the name of the folder where we will unzip files 
$zipDirectoryName = $env:TEMP + "\" + "TempZip" 

# delete the zip directory if present 
remove-item $zipDirectoryName -force -recurse -ErrorAction Ignore | out-null 

# create the zip directory 
New-Item -ItemType directory -Path $zipDirectoryName | out-null 

# Unzip the files -- i.e. extract the xml files embedded within the MS Office document 
unzip $DocumentFullPathName $zipDirectoryName 

# get the docProps\core.xml file as [xml] 
$coreXmlName = $zipDirectoryName + "\docProps\core.xml" 
[xml]$coreXml = get-content -path $coreXmlName 

# create an array of the requested properties 
$requiredProperties = $OfficeProperties -split ";" 

# create a hashtable to return the values 
$docProperties = @{} 

# Now look for each requested property 
foreach($requiredProperty in $requiredProperties) 
{ 
    # We will be lazy and ignore the namespaces. We need the local name only 
    $localName = $requiredProperty -split ":" 
    $localName = $localName[1] 
    # Use XPath to fetch the node for this property 
    $thisNode = $coreXml.coreProperties.SelectSingleNode("*[local-name(.) = '$localName']") 
    if($thisNode -eq $null) 
    { 
     # To the hashtable, add the requested property name and its value -- null in this case 
     $docProperties.Add($RequiredProperty, $null) 
    } 
    else 
    { 
     # To the hashtable, add the requested property name and its value 
     $docProperties.Add($RequiredProperty, $thisNode.innerText) 
    } 
} 

#clean up 
remove-item $zipDirectoryName -force -recurse 

# return the properties hashtable. To do this, just write the object to the output stream 
$docProperties 
相關問題