2011-08-29 48 views
1

我只是想知道爲什麼下面鏈接中的vbs代碼不能正確計算pdf頁面?似乎在每個pdf中實際存在的頁面數量減少了一半或更多。PDF頁數不正確

http://docs.ongetc.com/index.php?q=content/pdf-pages-counting-using-vb-script

這裏,如果您無法訪問上面的鏈接代碼:

' By Chanh Ong 
'File: pdfpagecount.vbs 
' Purpose: count pages in pdf file in folder 
Const OPEN_FILE_FOR_READING = 1 

Set gFso = WScript.CreateObject("Scripting.FileSystemObject") 
Set gShell = WScript.CreateObject ("WSCript.shell") 
Set gNetwork = Wscript.CreateObject("WScript.Network") 

    directory="." 
    set base=gFso.getFolder(directory) 
    call listPDFFile(base) 

Function ReadAllTextFile(filespec) 
    Const ForReading = 1, ForWriting = 2 
    Dim f 
    Set f = gFso.OpenTextFile(filespec, ForReading) 
    ReadAllTextFile = f.ReadAll 
End Function 

function countPage(sString) 
    Dim regEx, Match, Matches, counter, sPattern 
    sPattern = "/Type\s*/Page[^s]" ' capture PDF page count 
    counter = 0 

    Set regEx = New RegExp   ' Create a regular expression. 
    regEx.Pattern = sPattern ' Set pattern "^rem". 
    regEx.IgnoreCase = True   ' Set case insensitivity. 
    regEx.Global = True   ' Set global applicability. 
    set Matches = regEx.Execute(sString) ' Execute search. 
    For Each Match in Matches  ' Iterate Matches collection. 
    counter = counter + 1 
    Next 
    if counter = 0 then 
    counter = 1 
    end if 
    countPage = counter 
End Function 

sub listPDFFile(grp) 
    Set pf = gFso.CreateTextFile("pagecount.txt", True) 
for each file in grp.files 
    if (".pdf" = lcase(right(file,4))) then 
     larray = ReadAllTextFile(file) 
     pages = countPage(larray) 
     wscript.echo "The " & file.name & " PDF file has " & pages & " pages" 
     pf.WriteLine(file.name&","&pages) 
    end if 
next 
    pf.Close 
end sub 

感謝

+0

鏈接無效。 –

+0

我用包含的代碼更新了問題。 – artwork21

回答

2

試試這個

Function getPdfPgCnt(ByVal sPath) 
    Dim strTStr 

    With CreateObject("Adodb.Stream") 
     .Open 
     .Charset = "x-ansi" 
     .LoadFromFile sPath 
     strTStr = .ReadText(-1) 
    End With 

    With (New RegExp) 
     .Pattern = "Type\s+/Page[^s]" 
     .IgnoreCase = True 
     .Global = True 
     getPdfPgCnt = .Execute(strTStr).Count 
    End With 

    If getPdfPgCnt = 0 Then getPdfPgCnt = 1 
End Function 

'Usage : getPdfPgCnt("C:\1.pdf") 

更新#1〜# 2:

Option Explicit 

Private Function getPdfPgCnt(ByVal sPath) 'Returns page count of file on passed path 
    Dim strTStr 

    With CreateObject("Adodb.Stream") 
     .Open 
     .Charset = "x-ansi" 
     .LoadFromFile sPath 
     strTStr = .ReadText(-1) 
    End With 

    With (New RegExp) 
     .Pattern = "Type\s*/Page[^s]" 
     .IgnoreCase = True 
     .Global = True 
     getPdfPgCnt = .Execute(strTStr).Count 
    End With 

    If getPdfPgCnt = 0 Then getPdfPgCnt = 1 
End Function 

'-------------------------------- 
Dim oFso, iFile 
Set oFso = CreateObject("Scripting.FileSystemObject") 

'enumerating pdf files in vbs's base directory 
For Each iFile In oFso.getFolder(oFso.GetParentFolderName(WScript.ScriptFullName)).Files 
    If LCase(oFso.GetExtensionName(iFile)) = "pdf" Then WScript.Echo iFile & " has "& getPdfPgCnt(iFile)&" pages." 
Next 
Set oFso = Nothing 
'-------------------------------- 
+0

這個函數應該取代這個,函數countPage(sString)? – artwork21

+0

不完全。我已經將countPage和ReadAllTextFile與Strem對象相結合,而不是FSO和Regex。我更新了更多細節的答案。 –

+0

感謝您的更新。您的腳本比我最初發布的頁面更好地計算頁面數,但是有一些pdf頁正在計算一頁。你有什麼想法,爲什麼會發生,thx? – artwork21

2

提供(並接受)的解決方案僅適用於數量有限的PDF文檔。由於PDF文檔經常壓縮包括頁面元數據在內的大塊數據,粗略的正則表達式搜索「type \ s */page [^ s]」通常會錯過頁面。

唯一真正可靠的解決方案是很費力分解PDF文件。恐怕我沒有工作的VBS解決方案,但我已經寫了一個Delphi函數,演示瞭如何執行此操作(請參閱http://www.angusj.com/delphitips/pdfpagecount.php)。

+0

正確的壓縮是一個問題。還有其他的。 PDF不是一種可視爲文本的格式。 – mkl