pypdf的pdfs列表

我已經得到pypdf工作就好了一個單一的PDF文件，但我似乎無法讓它工作的文件，或for循環多個pdf文件，沒有失敗，因爲的字符串不可調用。任何想法我都可以用作解決方法？pypdf的pdfs列表

def getPDFContent(path): 
    content = "" 
    # Load PDF into pyPDF 
    pdf = pyPdf.PdfFileReader(file(path, "rb")) 
    # Iterate pages 
    for i in range(0, pdf.getNumPages()): 
     # Extract text from page and add to content 
     content += pdf.getPage(i).extractText() + "\n" 
    # Collapse whitespace 
    content = " ".join(content.replace(u"\xa0", " ").strip().split()) 
    return content 

#print getPDFContent(r"Z:\GIS\MasterPermits\12300983.pdf").encode("ascii", "ignore") 


#find pdfs    
for root, dirs, files in os.walk(folder1): 
    for file in files: 
     if file.endswith(('.pdf')): 
      d=os.path.join(root, file) 
      print getPDFContent(d).encode("ascii", "ignore") 

Traceback (most recent call last): 
    File "C:\Documents and Settings\dknight\Desktop\readpdf.py", line 50, in <module> 
    print getPDFContent(d).encode("ascii", "ignore") 
    File "C:\Documents and Settings\dknight\Desktop\readpdf.py", line 32, in getPDFContent 
    pdf = pyPdf.PdfFileReader(file(path, "rb")) 
TypeError: 'str' object is not callable

我使用的是列表，但我得到了確切的同樣的錯誤，我沒有想到這將是一個大問題，但截至目前它正在成爲一個。我知道我能夠在ArcPy中來解決類似的問題，但這是沒有密切

來源

2013-07-23 Doug Knight

如果您提供了一個完整的程序，這將有所幫助。請將您的程序降至最短的完整可運行程序，以顯示問題並將其粘貼到您的問題中。有關此調試技術的更多信息，請參閱http://SSCCE.org。 –

在你調用'file（path，「rb」）'的時候，我懷疑'file'並不意味着你認爲它的意思。嘗試在失敗調用之前立即添加'print type（file），file'。你的程序中的其他任何地方使用變量名'file'嗎？ –

儘量不要使用內置類型的變量名：

不要這樣做：

for file in files:

取而代之：

for myfile in files:

來源

2013-07-23 19:15:09

pypdf的pdfs列表

回答

相關問題